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DATABASES  IN  HEALTH  CARE 


I.  DEFINITION  OF  THE  TECHNOLOGY 

In  this  chapter  we  will  Introduce  the  concepts  of  database  technology  In  a 
way  that  will  make  It  easy  to  relate  the  terminology  to  problems  In  health 
care.  After  the  objectives  have  been  defined  the  major  components  of 
databases  and  their  function  will  be  discussed.  The  remainder  of  this 
chapter  will  present  the  scientific  and  the  operational  Issues  associated 
with  databases. 


I. A  Databases  and  Their  Objectives 

A  database  Is  a  collection  of  related  data,  which  are  organized  so  that 
useful  Information  may  be  extracted.  The  effectiveness  of  databases 
derives  from  the  fact  that  from  one  single,  comprehensive  database  much  of 
the  Information  relevant  to  a  variety  of  organizational  purposes  may  be 
obtained.  In  health  care  the  same  database  may  be  used  by  medical 
personnel  for  patient  care  recording,  for  surveillance  of  patient  status, 
and  for  treatment  advice;  It  may  be  used  by  researchers  In  assessing  the 
effectiveness  of  drugs  and  clinical  procedures;  and  It  can  be  used  by 
administrative  personnel  In  cost  accounting  and  by  management  for  the 
planning  of  service  facilities. 

The  fact  that  data  are  shared  promotes  consistency  of  Information  for 
decision-making  and  reduces  duplicate  data  collection.  A  major  benefit 
of  databases  In  health  care  Is  due  to  the  application  of  the  Information 
to  the  management  of  services  and  the  allocation  of  resources  needed  for 
those  services,  but  communication  through  the  shared  Information  among 
health  care  providers,  and  the  validation  of  medical  care  hypotheses  from 
observations  on  patients  are  also  significant. 

The  contents  and  the  description  of  a  database  has  to  be  carefully  managed 
in  order  to  provide  for  this  wide  range  of  services,  so  that  some  degree 
of  formal  data  management  Is  Implied  when  we  speak  of  databases.  The 
formalization,  and  the  large  data  quantity  Implied  In  effective  database 
operations  make  computerization  of  the  database  function  essential;  in  fact, 
much  of  the  Incentive  for  early  [Bush45]  and  current  computing  technology 
[Barsam7D]  Is  due  to  the  demands  made  by  Information  processing  needs. 

Hence,  the  notion  of  a  database  encompasses  the  data  themselves,  the  hardware 
used  to  store  the  data,  and  the  software  used  to  manipulate  the  data.  When 
the  database  Is  used  for  multiple  purposes  we  find  also  an  administration 
which  controls  and  assigns  the  resources  needed  to  maintain  the  data 
collection  and  permit  the  generation  of  Information. 

We  will  In  the  next  section  define  the  technical  scope  of  databases.  The 
remaining  sections  In  this  chapter  will  deal  specifically  with  current  and 
future  applications  of  databases  In  health  care. 
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I.B  Terminology  In  the  Aree  of  Data  Bases. 

Within  the  scope  of  databases  are  a  number  of  concepts,  which  are 
easily  confused  with  each  other.  The  objective  of  a  database  Is  to  provide 
Information,  but  not  all  systems  that  provide  Information  are  databases. 

We  will  first  define  the  term  'database' ,  and  then  some  terms  that  describe 
aspects  of  database  technology.  In  the  section  which  follows  we  will  present 
types  of  systems  which  are  related  or  similar  to  databases,  but  are  not 
conelderd  databases  within  this  review. 

A  database  Is  a  collection  of  related  data, 
with  facilities  that  process  these  data  to 
yield  Information. 

A  database  system  facilitates  the  collection,  organization,  storage,  ' 
and  processing  of  data.  The  processing  of  data  from  many  sources  can 
provide  Information  that  would  not  have  been  available  before  the  data 
were  combined  Into  a  database.  Hence,  a  collection  of  data  Is  not  by 
Itself  a  database,  a  system  that  supports  data  storage  Is  not  neccessarlly 
a  database  system,  and  not  all  the  Information  provided  by  computer 
systems  Is  produced  from  databases. 

I.B.l  Components  of  databases 

A  database  Is  hence  composed  both  of  data,  and  of  programs  or  software  to 
enter  and  manipulate  the  data.  Both  data  and  software  are  stored  within  the 
computers  which  support  the  database,  and  the  Internal  organization  may 
not  be  obvious  to  the  users.  We  will  now  describe  some  of  the  components 
that  are  part  of  database  software.  Databases  require  the  availability  of 
certain  technological  tools,  or  software  subsystems.  Some  of  these  tools, 
that  are  used  to  support  databases  can  also  be  used  Independently, 
and  hence  they  are  at  times  confused  with  the  database  system  Itself. 
Important  subsystems  are: 

a)  File  Storage  Systems  :  software  to  allocate  and  manage  space 

for  data  kept  on  large  computer  storage  devices,  such  as  disks  or 
tapes. 

b)  File  Access  Methods  :  software  to  rapidly  access  and  update 
data  stored  on  those  devices. 

c)  Data  Description  Languages  :  means  to  describe  data  so  that 
users  and  machines  can  refer  to  data  elements  and  aggregations 
of  similar  data  elements  conveniently  and  unamblgously. 

d)  Data  Manipulation  Languages  :  programs  to  allow  the  user  to 
retrieve  and  process  data  conveniently. 

In  a  database  these  subsystems  have  to  be  well  integrated,  so  that  the 
data  manipulation  can  be  carried  out  in  response  to  the  vocabulary  used 
In  the  data  descriptions.  Storage  Is  allocated  and  rearranged  as  new  data 
enter  the  database,  and  access  to  old  and  new  data  Is  provided  as  needed 
for  manipulation.  To  provide  the  neccessary  reliability  some  redundant 
backup  data  Is  stored  separatly  and  appropriate  Identified  whenever  the 
database  Is  changed.  Optional  software  components  of  a  database  may 
provtde  on-line,  conversational  access  to  the  database,  help  with  the 
formulation  of  statistical  queries,  and  provide  printed  reports  on  a  regular 
schedule. 
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I.B.2  File  Management  Systems  versus  Database  Management  Systems 

Of  primary  concern  to  a  database  effort  Is  the  reliable  operation  of  the 
devices  used  to  store  the  data  over  long  periods  of  time.  The  programming 
systems  which  provide  such  services,  typically  Inclusive  of  the  tools  listed 
In  a)  and  b)  above,  are  called  file  management  systems  (FMS). 

When  data  are  to  be  organized  so  that  they  can  be  accessed  by  a  variety  of 
users,  system  control  extending  to  the  Individual  users,  and  to  the  specific 
data  units  which  these  users  will  be  referencing,  may  be  needed.  Control 
over  the  data  and  Its  use  can  only  be  achieved  If  all  users  access  the 
database  always  via  programs  that  will  protect  the  reliability,  privacy,  and 
Integrity  of  the  database.  We  achieve  reliability  when  data  are  not  lost  due 
to  hardware  and  software  errors.  We  protect  privacy  when  we  guarantee  that 
only  authorized  access  will  occur.  We  define  Integrity  as  freedom  from 
errors  that  could  be  Introduced  by  simultaneous  use  of  the  database  by  users 
that  may  update  Its  contents.  A  database  management  system  (DBMS)  should 
provide  all  the  required  database  support  programs,  Including  management  of 
files,  scheduling  of  user  programs,  database  manipulation,  and  recovery  from 
errors.  All  these  should  form  a  well  Integrated  package. 

Not  every  database  Is  managed  by  a  commercial  DBMS.  Database  support  can 
also  be  provided  by  programs  that  use  one  of  the  available  file  management 
systems.  The  contents  of  the  database  can  be  Identical  for  a  system  using 
a  generalized  DBMS  product  or  one  using  programs  written  specifically  for 
the  task.  A  locally  developed  collection  of  programs  rarely  has  the  all  of 
the  protective  features  that  are  desirable  when  multiple  users  Interact  with 
the  database  from  terminals.  The  manner  In  which  users  gain  access  will 
always  depend  on  the  choice  of  the  DBMS  or  the  file  management  system. 

For  Instance,  a  file  system  does  not  provide  automatic  scheduling  of  user 
requested  activities.  Without  a  DBMS  the  users  will  have  to  schedule  their 
own  activities  In  such  a  way  that  simultaneous  data  entry  Is  avoided.  Some 
file  systems  will  simply  disallow  such  access,  In  other  systems  such  usage 
could  lead  to  Inconsistent  data.  If  data  entry  activities  are  organized  so 
that  such  conflicts  are  avoided  then  there  is  less  need  for  the  complexity 
of  a  DBMS.  A  very  popular  file  management  system  In  medicine  Is  MUMPS, 
developed  at  Massachusetts  General  Hospital  to  support  clinical  use  of 
relatively  small  computer  systems  [Bow1e77J. 

Both  file  management  systems  (FMS)  and  database  management  systems  (DBMS) 
are  available  commercially  for  most  computers.  Some  DBMS’s  will  make  use 
of  an  existing  FMS,  others  will  perform  all  but  the  most  primitive  file 
access  functions  themselves.  Since  a  DBMS  Interacts  closely  with  the  user 
of  the  database,  we  find  that  distinct  types  of  DBMS’s  have  been  developed. 
DBMS's  also  differ  In  terms  of  the  comprehensiveness  of  software  services. 
Most  manufacturers  provide  an  FMS  at  no  additional  cost,  but  acquisition 
of  a  DBMS  Is  rarely  free. 

The  choice  of  a  particular  type  of  database  management  system  will  Influence 
the  structure  of  the  future  database.  Not  every  type  of  DBMS  will  be 
available  on  a  given  computer,  but  for  most  medium  to  large  computers  there 
Is  some  choice.  Simplicity  versus  generality  and  cost  are  often  a  trade-off. 
Even  so-called  generalized  database  management  systems  Impose,  to  a  great 
extent,  the  view  of  the  designer  or  sponsor  of  such  a  DBMS.  Many  of  the 
major  systems  now  being  marketed  were  designed  to  solve  the  complexities 
of  specific  applications.  We  hence  find  DBMS's  that  excel  In  Inventory 
management,  some  do  excellent  retrieval  of  bibliographic  citations,  others 
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have  a  strong  bias  towards  statistical  processing.  Even  within  the  medical 
area  different  DBMS’s  will  emphasize  one  of  the  many  objectives  that  are 
found  within  the  range  from  patient  care  to  medical  research.  The  following 
table  will  list  some  database  systems  found  In  medicine  with  an  Indication 
of  their  objective.  We  distinguish  In  this  table:  general  ambulatory  patient 
care,  clinical  or  speciality  outpatient  care,  hospital  Inpatient  care,,  or 
patient  management  and  record  keeping  In  these  areas.  Clinical  studies  refers 
to  research  data  collection  on  defined  populations.  Guidance  refers  to  the 
giving  of  medical  advice  during  the  Inquiry  process.  Details  of  these  tyes 
of  application  are  given  In  chapter  2  of  this  review.  The  types  of  database 
organizations  can  be  categorized  as  tabular,  relational,  hlerarchlal,  or 
network.  These  terms  will  be  defined  In  section  B.4  of  this  chapter. 
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I.B.3  Related  systems 

Data  are  collected  and  stored  into  a  database  with  the  expectation  that  at  a 
later  time  the  data  can  be  analyzed,  conclusions  can  be  drawn,  and  that  the 
Information  obtained  can  be  used  to  influence  future  actions.  Information  Is 
generated  from  data  through  processing,  and  should  increase  the  knowledge  of 
the  receiver  of  this  Information.  This  person  then  should  have  the  means  to 
act  upon  the  Information,  perhaps  to  the  benefit  of  a  larger  community. 

||  The  production  of  information  is  the  central  objective  of  a  database.  || 

There  are  other  automated  information  processing  systems  which  are  not 
considerd  databases,  although  they  may  share  some  of  the  technology.  In  the 
remainder  of  this  section  two  categories  of  such  related  systems  will  be 
presented. 

INFORMATION  SYSTEMS  store  information  -  often  the  output  of  earlier  data 
analyses  -  for  rapid  selective  retrieval  [Beckle77].  A  well  known  example 
is  the  MEOLARS  system  [Katter75,  Leiter77],  a  service  of  the  National  Library 
of  Medicine,  which  provides  access  to  papers  published  in  the  medical 
literature.  The  task  of  such  an  information  system  is  the  selection  and 
retrieval  of  information,  but  not  the  generation  of  information  [Lucas78]. 
Index  Medlcus  for  Instance  only  provides  the  references,  and  depends  on  the 
user’s  own  library  [Kunz79].  Even  maintenance  of  personal  reference  files 
can  be  effectively  automated  [Reiche68].  The  benefits  are  due  to  the  speed 
and  Improved  coverage  with  which  the  documents  can  be  found. 

The  boundary  between  Information  systems  and  database  systems  Is  not  at  all 
absolute.  One  can  perhaps  even  speak  of  a  spectrum  of  system  types.  When 
the  queries  are  simple  the  two  system  types  are  in  fact  indistinguishable. 
Retrieval  of  the  age  of  a  patient,  for  instance,  can  be  carried  out  with 
equal  facility  on  either  type  of  system.  But  when  another  observation,  say 
cholesterol  level,  has  to  be  compared  with  the  average  cholesterol  level  for 
all  other  patients  of  the  same  age,  then  a  computation  to  generate  this 
Information  is  needed,  and  a  system  which  is  able  to  do  this  is  placed  more 
on  the  database  side  of  the  spectrum. 


DECISION  SUPPORT  SYSTEMS  assist  with  the  manipulation  of  data  supplied  by 
the  user  [Dav1s78].  The  help  may  be  principally  algorithmic  -  perhaps 
assuring  that  Bayes’  rule  is  properly  applied.  More  specialized  systems 
embody  medical  knowledge  [Johnso79],  for  instance  in  acid-base  balance 
assessment  [Bleich72]  and  anti-microbial  therapy  [Yu79].  While  these  systems 
could  be  coupled  to  databases,  so  that  they  become  also  knowledgeable  about 
a  specific  patient,  today  they  are  typically  separate  [Gabrie78].  Work  In 
In  decision  making  for  health  care  cost  control  has  Indicated  a  need  for 
database  facilities  in  these  applications  [BrookW76J. 

The  HELP  system,  at  the  LDS  hospital  in  Salt  Lake  City,  does  keep  a  separate 
file  of  clinical  decision  criteria  and  applies  them  to  the  patient  database 
as  it  is  updated.  The  system  then  advises  the  physician  to  consider  certain 
actions  or  further  diagnostic  tests  [Warner78J.  As  medical  databases  become 
more  reliable  and  comprehensive  we  can  envisage  increased  exploitation  of  the 
Information  contained  in  them  by  systems  which  embody  medical  knowledge. 
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I.C  The  Scientific  Basis  for  Database  Technology 


The  emergence  of  databases  Is  not  so  much  due  to  particular  inventions,  but 
Is  a  logical  step  In  the  natural  development  of  computing  technology.  The 
evolution  of  computational  power  began  with  the  achievement  of  adequate 
reliability  of  complex  electronic  devices.  The  mean-tlme-to-fallure  reached 
several  hours  for  powerful  computers  about  1955.  At  that  point  the  concerns 
moved  to  the  development  of  programming  languages,  so  that  programs  of 
reasonable  power  could  be  written.  These  programs  had  the  capacity  to 
process  large  quantities  of  data,  and  In  the  early  sixties  magnetic  tape  and 
disk  devices  were  developed  to  make  the  data  available.  Operating  systems  to 
allocate  storage  and  processing  power  to  the  programs  became  the  next 
challenge.  By  the  late  sixties  these  systems  had  matured  so  that  multi-user 
operation  became  the  norm.  As  these  foundations  were  laid  It  became  feasible 
to  keep  data  available  on-line,  l.e.,  directly  accessible  by  the  computer 
system  without  manual  Intervention,  like  fetching  and  mounting  computer 
tapes.  Now  a  variety  of  application  programs  can  use  those  data  as  needed. 

In  current  systems  valuable  data  can  be  kept  on-line  over  long  periods 
without  fear  of  loss  or  damage  to  the  database. 

I .C . 1  The  Schema 

The  one  technical  concept  which  Is  central  to  database  management  systems 
Is  the  schema.  A  schema  Is  a  formalized  description  of  the  data  that  are 
contained  In  the  database,  available  to  the  programs  that  wish  to  use  the 
data.  All  data  kept  In  such  a  database  Is  Identified  with  a  name,  say  DOB 
for  date-of-blrth .  With  a  schema  It  Is  sufficient  for  application  programs 
to  specify  the  name  of  the  data  they  wish  to  retrieve.  A  command  may  state: 

date«-of«-b1rth  =  GET  (  current«-pat1ent,  DOB  )  ; 

The  database  system  will  use  the  schema  to  match  the  name  of  the  requested 
data.  When  a  corresponding  entry  In  the  schema  Is  found,  the  database  system 
can  use  .nformatlon  associated  with  the  entry  to  determine  where  the 
requested  data  have  been  stored,  locate  the  data  values,  and  retrieve  them 
Into  the  application  program  area  (  date*-of«-b1rth)  for  analysis  or  display. 
During  this  process  It  Is  possible  to  check  that  the  requestor  Is  authorized 
to  access  the  data.  The  DBMS  may  also  have  to  change  the  data  Into  a 
representation  that  the  program  can  handle  [Fe1ns78A].  similar  processes 
are  carried  out  by  the  DBMS  when  old  data  are  to  be  updated  and  when  new 
data  are  to  be  added  to  the  database. 

The  schema  is  established  before  any  data  can  be  placed  Into  the  database 
and  embodies  all  the  decisions  that  have  been  made  about  the  contents  and 
the  structure  of  the  database.  Each  Individual  type  of  data  element  will 
receive  a  reference  name.  The  data  to  be  kept  under  this  name  may  be  further 
defined.  The  most  important  specification  is  whether  the  data  are  numeric, 
a  character  string,  or  a  code.  Codes  then  need  tables  or  programs  for  their 
definition.  Other  schema  entries  give  the  format  and  length  of  the  data 
element,  and  perhaps  the  range  of  acceptable  values.  For  observations  of 
body  temperature  the  five  descriptors  might  be; 

TEMP,  temperature  In  degrees  C,  numeric,  XX. X,  36.8  to  44.8. 

The  data  elements  so  described  will  have  to  fit  into  a  structure;  a  value  by 
Itself,  say  TEMP  ■  41.9,  Is  of  course  meaningless.  This  data  element  belongs 
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In  an  observation  record,  and  the  observation  record  must  contain  other 
data  elements,  namely  a  patient  Identification  (ID),  a  date,  and  a  time. 
These  data  elements,  which  are  used  to  Identify  the  entity  described  In  the 
record,  constitute  the  ruling  part;  without  these  there  Is  Insufficient 
Information  present  to  make  the  TEHPerature  observation  useful.  The  ruling 
part  data  types  (  ID,  DATE,  TIME  )  will  also  appear  In  the  schema. 

The  observation  record  may  contain,  In  addition  to  TEMP,  other  dependent 
data  elements  as:  the  pulse  rate,  the  blood  pressure  nd  .the  name  of  the 
observer.  The  entire  observation  record  can  then  be  described  as  a  list  of 
seven  attributes,  as  follows: 

Observations:  ID,  DATE,  TIME  >  TEMP,  PULSE,  BP,  OBSERVER; 

The  first  three  attributes  form  the  ruling  part,  the  other  four  are  the 
dependent  part;  we  seperate  the  two  parts  with  a  >  symbol.  Each  attribute 
has  associated  with  It  a  schema  entry  with  the  five  descriptors  shown  for 
the  TEMP  entry  above.  There  will  be  other  kinds  of  records  In  the  database: 
a  patient  demographic  data  record  will  exist  In  most  databases  we  consider. 
Here  the  only  data  element  In  the  ruling  part  will  be  the  ID  field.  This 
record  may  be  In  part  as  below: 

Patients:  ID  >  PATIENT-NAME,  ADORESS,  DOB,  SEX,  ...  ; 

Matching  of  the  ID  fields  establishes  the  relationship  between  patient 
demographic  data  records  and  the  observation  records.  The  known  relationships 
between  record  types  should  also  be  described  In  the  schema,  so  that  the  use 
of  the  schema  Is  simplified  [Manach75,  Chang076]. 

We  use  three  types  of  connections  to  describe  relationships  between  records 
[W1eder79],  their  use  Is  also  sketched  In  the  figure  below. 

a)  The  Identity  Connection  -  used  where  the  ruling  parts  are  similar, 

but  different  groupings  are  described; 
for  Instance  both  hospital  patients  and  diabetes  clinic  patients 
are  patients  with  patient  ID’s,  but  may  have  different  dependent 
data  stored  in  their  files. 

b)  The  Reference  Connection  •  used  where  there  Is  a  common  descriptive 

record  referred  to  by  multiple  data  records; 
for  Instance  the  physician  seen.  Is  a  record  type  referred  to 
from  the  patients  clinic  visit  records. 

c)  The  Nest  Connection  -  used  where  there  are  many  subsidiary  records  of 

some  type  which  depend  on  a  higher  level  record; 
multiple  nest  connections  define  an  association; 
for  Instance  the  multiple  clinic  visits  of  a  specific  patient, 
each  with  data  on  his  temperature,  blood  pressure,  etc.  form 
a  nest  of  the  patient  record. 

An  association  occurs  In  the  figure  where  a  physician  has 
admitting  privileges  at  one  or  more  hospitals,  and  each  hospital 
grants  admitting  privileges  to  a  number  of  physicians.  The 
admlttlng-prlvl leges  file  has  as  ruling  part  both  the  physician’s 
ID  and  the  hospital  name,  a  dependent  data  element  might  be  the 
date  the  privilege  was  granted. 


Associated  with  the  connection  types  may  be  rules  for  the  maintenance  of 
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database  Integrity.  Such  rules  can  Inform  the  database  system  that  certain 
update  operations  are  not  permissible,  since  they  would  make  the  database 
inconsistent.  For  example  we  would  not  want  to  add  a  clinic  patient  without 
adding  a  corresponding  record  to  the  general  patient  file,  If  the  patient 
did  not  yet  exist  there.  Simllarily  deletion  of  a  physicians  record  from 
the  database  Implies  deletion  of  the  associated  admitting  privileges. 


I.C. 2  The  Data  Nodal 

In  order  to  provide  guidance  for  the  creator  of  the  schema  It  Is  Important 
to  have  design  tools.  A  large  database  can  contain  many  types  of  records, 
and  even  more  relationships  between  the  record  types.  These  have  to  be 
understood  and  used  by  a  variety  of  people:  the  programmers  who  devise  data 
entry  and  analysis  programs,  the  researchers  who  wish  to  explore  the  database 
In  order  to  formulate  or  verify  new  hypotheses,  and  the  planners  who  wish  to 
use  the  data  as  basis  for  modelling  so  that  they  can  predict  the  response 
to  future  actions.  A  variety  of  models  exist  t ACH763;  some  models  are 
abstractions  of  the  facilities  that  certain  types  of  database  management 
systems  can  provide,  other  use  more  generalized,  mathematical  abstractions 
to  represent  the  data  and  their  relationships.  Recent  work  In  database 
research  Is  directed  towards  Improving  the  representation  of  the  semantics 
of  the  data  [Hammer78,  ElMasr79,  Codd79]  so  that  the  constraints  of  the 
relationships  that  exist  In  the  real  world  can  be  used  to  verify  the 
appropriateness  of  data  that  are  entered  Into  the  database. 

Any  reasonable  model  of  the  database  can  provide  a  common  ground  for 
communication  between  users  and  Implementors,  without  a  model  there  Is  apt 
to  be  an  excess  of  detail  [W1eder78].  An  example  of  a  data  model  for  a 
clinical  database  Is  shown  below. 
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The  nest  (  )  connections  Indicate  that  there  may  be  multiple 

Inferior  Instances  for  each  superior  Instance. 

The  reference  (  -->  )  connections  Indicate  that  there  may  be 
multiple  references  to  each  Instance. 

The  Identity  <  »>  >  connection  defines  a  subgroup. 


I.C.3  Types  of  Database  Models 

A  popular  approach  to  database  analysis  distinguishes  several  categories 
of  databases.  Database  system  Implementations  can  be  associated 
with  each  category.  These  categories  are  represented  by  database 
model  types ,  the  best  known  types  are  the 

Relational  model*  derived  from  the  mathematical  theory  of 
relations  and  sets. 

Hierarchical  model*  related  to  tree-shaped  database  Implementations, 
similar  to  corporate  organization  diagrams. 

Network  model  -  permits  Interconnections  that  are  more  complex 
than  hierarchies,  based  on  a  definition  developed  by  a 
committee  of  specialists  In  commercial  system  languages. 

The  structural  model  can  describe  the  structures  of  any  of  these  three 
models,  as  well  as  of  other  database  Implementations.  If  only  a  single 
record-type  -  a  box  In  the  above  diagram  -  Is  Implemented  then  we  are 
dealing  with  a  ’universal  relation*  [Ullman79].  A  single  box  for  a 
complex  database  would  have  many  columns  and  rows,  and  contain  many  null 
entries.  If  the  data  are  organized  Into  several  record-types,  each 
corresponding  to  some  meaningful  entity,  then  we  are  dealing  with  a 
’tabular  database';  If  a  completely  general  query  and  processing  capability 
exists  In  such  a  system,  we  have  Implemented  the  ’relational  model’ 

[Coddft]. 

At  this  point  the  entities  stand  alone,  and  some  analysis  Is  needed  to 
relate  them.  If  any  of  the  Indicated  connections  have  been  Implemented  then 
we  may  have  a  network  or  a  hierarchical  database.  In  the  hierarchical  model 
a  record-type  may  havq  only  one  nest  connection  (  )  pointing  to  It.  The 

Implementation  of  multiple  nest  connections,  which  creates  a  network  with 
associations.  Is  considerably  more  complex  [Stoneb75,  Xieder77].  Several  of 
the  larger  commercial  DBMS’s  are  based  on  work  by  tne  Data  Base  Task  Group 
of  COOASYL,  and  do  support  such  network  structures  [011e7B].  These  systems 
often  do  not  support  the  general  Inquiry  capability  of  the  relational  model 
Implementations. 

It  Is  Important  to  note  that  there  Is  a  distinction  between  a  model  and  its 
Implementation.  A  model  is  an  abstraction  and  provides  a  level  of  Insight 
which  can  cut  through  masses  of  confusing  detail.  In  the  Implementation 
this  detail  has  to  be  considered.  It  Is  likely  that  the  Implementation  will 
differ  considerably  from  the  model  used  to  describe  It.  As  more  powerful 
models  are  developed  this  distinction  may  become  greater.  An  Implementation 
may  then  be  best  described  In  terms  of  transformations  that  are  applied  to 
the  model  which  defines  the  database  at  a  high  conceptual  level.  Most 
transformations  are  done  for  reasons  of  operational  performance  and 
reliability. 
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I.D  Database  Operation 

In  the  section  above  we  have  discussed  the  scientific  basis  of  databases. 

In  order  to  use  and  benefit  from  that  science  a  database  operation  has  to 
be  established,  and  that  Involves  Many  decisions  of  practical,  but  critical 
Importance.  This  section  will  consider  such  topics. 

When  the  database  design  has  been  established,  and  a  suitable  software 
system  has  been  obtained,  then  data  collection  can  commence.  Data  Is 
often  obtained  partially  from  sources  that  were  In  existence  before  a 
database  was  considered.  To  complete  the  database,  so  it  can  serve  the 
Intended  broad  scope,  new  data  collection  points  may  have  to  be  defined. 

The  value  of  adding  data  to  the  database  has  to  be  conslderd,  since  Data 
collection  and  entry  Is  costly  and  susceptible  to  errors.  We  will  begin 
with  a  discussion  of  Issues  In  entering  of  data,  and  then  proceed  to  data 
storage  and  organization  concerns,  discuss  data  presentation  Issues,  and 
finish  with  some  remarks  about  database  administration. 

I.D.l  Entering  Data  Into  the  Database 

The  relatively  high  cost  of  data  entry  Is  a  major  concern.  It  Is  obvious 
that  data  that  cost  more  to  collect  than  they  are  worth  should  be  avoided. 
When  a  certain  data  element  Is  entered  Its  utility  Is  hard  to  predict:  Its 
usefulness  may  depend  on  Its  value,  on  the  completeness  of  this  patient’s 
record,  and  on  the  patient's  returning  to  the  clinic,  so  that  follow-up  Is 
possible.  These  factors  are  not  easy  to  control.  The  actual  problem  of 
data  acquisition  can,  however,  be  addressed.  Much  less  formal  attention 
has  been  given  In  the  literature  to  this  subject  than  to  the  topic  of  data 
retrieval  [6reenf76]. 

When  data  are  to  be  collected  there  are  the  costs  of  the  actual  collection, 
of  the  transcription  to  some  processable  form,  and  of  the  actual  entry  Into 
a  computer.  The  data  collection  Is  to  a  great  extent  the  physician’s  task. 
While  automated  clinical  Instruments  can  collect  objective  values,  and  the 
patients  themselves  can  enter  their  own  history  [Slack66],  many  subjective 
and  Important  findings  emanate  from  the  physician. 

It  may  be  considered  desirable  to  minimize  changes  to  the  traditional 
manner  of  medical  data  recording,  so  that  the  physicians  continue  to  collect 
their  findings  as  notes  In  free  text  or  by  dictation.  These  reports  are 
then  transcribed  by  clerical  personnel  Into  the  computer.  This  format 
presents  the  medical  Information  In  a  way  that  Is  least  affected  by 
mechanical  restrictions.  To  enable  retrieval  of  such  observations  the 
specific  statements  or  paragraphs  may  be  categorized  Into  functional 
groups  as  findings,  treatment,  plans,  etc.  as  proposed  by  [Kore1n71].  A 
system,  based  on  these  concepts  has  served  well  In  a  city  hospital 
pediatric  clinic  setting.  Of  particular  Importance  was  that  patient  data 
retrieval  for  emergency  and  unscheduled  visits  became  possible  [Lyman76]. 

When  textual  data  are  to  be  used  for  analysis,  we  find  that  they  are  nearly 
Impossible  to  process  In  the  form  they  were  entered.  An  immediate  problem 
Is  that  the  natural  language  text  has  to  be  parsed  so  that  Its  meaning 
can  be  extracted.  Both  the  parsers  and  the  associated  dictionaries  are 
substantial  pieces  of  software.  But  even  when  language  understanding  Is 
achieved,  consistent  data  for  entry  may  not  have  been  obtained  since 
medical  terminology  varies  over  time  and  among  health  care  providers.  In 
general  some  encoding  Is  needed.  It  may  then  be  of  benefit  both  to  the 


physician  and  to  the  system  to  choose  a  method  of  data  collection  which 
encodes  data  Immediately  Into  a  more  rigorous  form.  Various  choices  exist 
to  encode  data  : 

1.  The  encoding  can  be  carried  out  by  clerical  personnel  [Valbon79]. 

2.  Natural  language,  l.e.  English  text,  may  be  analyzed  and  converted 
by  a  program  that  processes  the  text  within  the  medical  context 
[Pratt73,  Okubu75] 

3.  A  constrained  set  of  keywords  for  data  values,  for  example  the  list: 

{no,  light,  moderate,  serious), 

can  be  attached  to  the  schema  entry  for  a  specific  data  type. 

These  data  values  will  be  converted  on  data  entry  to  an  Internal 
code  [W1eder7S]. 

4.  Where  the  number  of  possible  data  elements,  for  which  data  are  to  be 
collected,  Is  large,  the  name  of  the  data  element,  l.e.  'facial  rash’, 
may  be  encoded  In  addition  to  the  data  value  Itself  [Kammon73,  Wong78]. 

5.  Keywords  may  be  checked  on  a  form  or  selected  from  a  menu  presented 
on  a  display  screen  [Schult76].  Selection  can  de  accomplished  using 
touch-sensitive  screens,  llghtpens,  cursors  operated  by  joysticks  or 
key-pads,  or  by  entering  on  a  keyboard  a  digit  which  refers  to  a 
line  of  he  presented  menu. 

6.  Where  the  list  of  keywords  Is  too  long  for  screen  presentation 

a  hierarchical  menu  selection  can  be  provided  or  a  subset  of  the 
keywords  corresponding  to  a  few  Initial  letters  can  be  displayed 
[Morgan77]. 

7.  The  forms  or  menus  to  be  used  for  data  collection  may  be  generated 
using  the  schema  of  the  database  managment  system  [Hanley78]. 

With  the  continuing  development  of  fast  display  technology  the  latter  choices 
seem  to  have  the  most  promise.  The  response  for  screen  selection  and 
presentation  of  the  next  menu  has  to  be  extremely  rapid  (  0.4  sec.  per  screen 
Is  cited  )  to  encourage  direct  physician  use  of  the  devices  [Watson77],  Such 
speeds  are  very  difficult  to  achieve  today,  since  the  display  frames  reside 
on  remote  disk  storage  devices  and  have  to  be  fetched,  formatted,  and 
transmitted  by  file,  user,  and  communication  programs  for  presentation  on 
terminals.  When  those  terminals  are  connected  via  telephone  lines  to  the 
computer  another  bottleneck  appears.  To  transmit  a  display  frame  of  24  lines 
of  58  characters  each,  at  the  fastest  available  rate,  9608  bits/second,  still 
requires  one  second.  To  cope  with  this  problem  either  special  communication 
lines  or  storage  devices  local  to  the  terminal  are  needed. 

Numeric  values  are  not  as  easily  entered  on  a  touch-screen  as  are  choices 
among  discrete  elements.  Keyboard  entry  may  continue  to  dominate  this  part 
of  data  entry,  unless  the  values  can  be  obtained  directly  from  medical 
Instrumentation.  Typed  data  requires  much  editing.  Comprehensive  commands 
for  specification  of  Input  editing  are  part  of  the  HUMPS  language,  and 
have  contributed  greatly  to  Its  acceptance.  Modern  computer  languages,  as 
PASCAL,  also  provide  within  the  variable  declarations  a  capability  to 
limit  the  range  or  the  set  of  choices  of  values  to  be  entered. 


1.0.2  Data  Storaga 
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The  cost  of  data  storage  Is  now  Much  lower  than  cost  of  data  entry.  This 
means  that  If  data  entry  Is  worthwhile,  the  entered  data  can  be  stored  for  a 
reasonably  long  tlaie.  The  characteristics  of  Medical  record  structures  can, 
however,  easily  lead  to  a  waste  of  coeputer  storage  space  which  is  an  order 
of  Magnitude  greater  than  the  actual  date  storage  space  needed.  This 
occur a  whan  data  ara  stored  In  alMpla  rectangular  tables,  since  the  variety 
of  Medical  data  requires  Many  columns,  but  at  one  encounter  only  a  few 
values  wll  be  collected.  Hierarchical  file  organizations  allow  linkage  to 
a  variable  number  of  subsidiary  data  elements,  and  In  this  manner  provide 
efficient  storage  utilization,  whereas  tha  older  tabular  files  dealt 
poorly  with  medical  data  [6reene69].  The  encoding  techniques  used  for  data 
entry  can  also  provide  compaction  of  stored  data  since  short  codes  are 
used  to  denote  long  keywords. 

Data  structures  can  often  be  compressed  by  suitable  data  encoding 
techniques  applied  to  the  files.  Especially  unobserved  data  elements  do 
not  need  actual  storage  space.  Data  compression  can  reduce  both  the 
storage  requirements  and  the  access  times  greatly  [W1eder77],  When  space 
considerations  are  no  longer  an  Important  Issue  In  data  organization,  an 
apparent  tabular  format  can  again  b a  used,  and  this  can  simplify  data 
analysis  programs.  In  the  clinical  databank  described  In  section  IV. C, 

TOO,  the  compressed  data,  after  encoding  to  account  for  missing,  zero,  or 
repeating  data,  occupied  only  15%  of  the  original  storage  space. 

Older  data  often  become  less  Interesting,  and  cbo  be  moved  to  archival 
storage.  Storage  on  magnetic  tape  Is  quite  Inexpensive  and  the  data  can  be 
recovered.  If  needed  for  analysis,  with  a  moderate  delay.  In  a  well-run 
operation  data  can  be  recovered  from  tapes  for  on-line  access  In  about 
an  hour  [Soffee76].  The  major  problems  are  the  development  of  effective 
criteria  for  selection  of  data  for  archival  storage  and  the  cataloging  of 
archival  data,  so  that  they  can  be  retrieved  when  needed.  Candidates  for 
archiving  are  detailed  records  of  past  hospitalizations  and  episodes  of 
acute  Illnesses. 


1. 0.3  Data  Organization  for  Retrieval 

The  Important  point  In  research  usage  of  databases  Is  that  Information  Is 
not  produced  by  the  retrieval  and  Inspection  of  a  few  values,  but  rather 
from  the  relating  of  many  findings  In  accordance  with  hypothesized  cause 
and  effect  relationships.  When  the  data  files  grow  very  large,  repetitive 
scans  for  data  selection  may  become  prohibitively  slow,  especially  during 
the  data  exploration  phase.  We  distinguish  the  following  phases  In  the 
research  use  of  clinical  databases: 

1.  Initial  definition  of  the  data  to  be  collected,  with  consideration 
for  clinical  needs.  The  expected  usefulness  Is  often  based  on 
vague  or  Ill-defined  Initial  hypotheses. 

2.  Exploratory  analysis,  using  tabulations  and  simple  graphics  In 
order  to  compare  subsets  of  the  population. 

3.  Hypothesis  generation  based  on  perceived  patterns,  definition  of 
Independent  end  dependent  variables  according  to  some  clinical 
model . 
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4.  Data  validation  and  sometimes  axpanslon  of  data  collection  In  the 
areas  In  which  patterns  appear  Interesting. 

5.  Subset  definition  and  generation  so  that  differences  due  to  the 
Independent  variables  can  be  made  explicit. 

6.  Exhaustive  statistical  analysis  of  the  subsets  to  verify  or  refute 
the  hypotheses. 

It  Is  Important  to  have  good  subsetting  facilities  and  efficient  access  to 
defined  subsets.  Such  services  are  provided  In  many  clinical  systems,  but 
the  techniques  vary  widely.  Often  the  subsets  are  extracted  and 
manipulated  as  distinct  databases  [Habry77].  In  other  systems  a  subset  Is 
kept  as  a  collection  of  references  to  records  In  the  main  database 
[6erman7S],  and  In  yet  another  system  the  subset  Is  recreated  from  the 
definition  of  the  subset  [Todd75]. 

Since  access  to  data  In  research  Is  primarily  by  attribute  field  rather  than 
by  patient  record,  It  can  be  profitable  to  transpose  the  database  [W1eder77]. 
Transposition  generates  one,  possibly  very  long,  record  for  each  attribute 
of  the  database.  Such  a  record  now  contains  a  sequence  of  values  of  this 
particular  attribute  for  all  patients  or  all  visits  of  all  patients.  Many 
current  computer  systems  cannot  manage  such  long  records  easily  but  the 
benefits  should  be  clear:  to  relate  blood  pressure  results  to  dosage  of 
an  anti -hypertensive  drug  only  two  records  have  to  be  retrieved  from  the 
transposed  file.  In  a  conventional  file  organized  by  patient  visit  every 
visit  record  Is  accessed  to  retrieve  the  two  fields  needed  to  accomplish 
this  comparlslon. 

To  avoid  scanning  an  entire  conventional  file,  access  structures  can  be 
created  which  speed  up  the  record  selection  process.  Attributes  which 
are  expected  to  be  used  In  record  selection  are  entered  into  an  auxiliary 
index  file,  which  Is  then  maintained  In  sorted  order.  If  the  attribute  Is 
"bloodpressure"  all  hypertensives  will  appear  at  the  beginning  of  the 
corresponding  Index  file.  With  every  blood  pressure  value  a  reference 
pointer  to  the  corresponding  visit  record  will  be  kept.  Now  only  the  data 
records  for  patient  visits  where  the  blood  pressure  was  high  have  to  be 
retrieved. 

Bitmaps  provide  a  simplified  form  of  Indexing.  Whereas  an  Index  Is  based 
on  the  actual  data  values,  a  bitmap  uses  simple  categorizations  of  these 
values.  In  a  list  with  entries  which  correspond  to  the  records  In  the 
datafile  a  bit  Is  set  to  one  If  the  data  values  In  the  record  meet  a  certain 
condition.  This  condition  could  be  a  blood  pressure  greater  than  160/1BB 
[Ragan78].  Both  Indexing  and  bitmaps  can  be  viewed  as  providing 
the  capability  of  preselection  of  relevant  records.  If  the  selection  of 
Indexes  or  bit  map  definitions  matches  the  retrieval  requests  well, 
access  to  conventional  files  can  become  much  faster.  The  maintenance  of 
such  access  structures  will  of  course  require  additional  effort  at  the 
tlaie  of  data  entry. 

There  are  many  cases  where  more  computation  at  the  time  of  data  entry  can 
reduce  the  effort  that  required  at  data  retrieval  time.  In  some  applications 
It  may  be  known  that  certain  computable  results  of  the  collected  data  will 
be  needed  at  a  later  time.  Then  such  results  may  actually  be  already 
computed  and  stored  within  the  database  when  the  source  data  are  entered. 
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Typical  of  a  precomputed  or  actual  result  Is  the  maximal  value  of  a 
clinical  observation  on  a  given  patient,  say  blood  pressure,  which  could 
bo  kept  available  so  that  no  search  through  multiple  visits  Is  needed  to 
Identify  a  patient  with  evidence  of  hypertension  [Melsk178].  Other 
candidates  for  precomputation  are  totals,  averages,  or  the  range  of  values 
of  a  variable  [W1eder75].  The  total  amount  outstanding  on  e  bill  and  the 
range  of  a  diabetics  blood-sugar  level  are  other  examples. 


1. 0.4  Data  Presentation 

Data  from  databases  can  be  presented  In  the  form  of  extensive  reports 
for  manual  scanning,  as  summary  tabulations,  or  as  graphs  to  provide  rapid 
visual  comprehension  of  trends.  An  extensive  data  analysis  may  lead  to  a 
printout  of  statistical  findings  and  their  significance,  or  may  provide 
clinical  advice  In  terms  of  diagnosis  or  treatment.  When  simple  facts  are 
to  be  retrieved  the  results  are  apt  to  be  compact  and  easy  to  display  or 
print.  If  much  computation  Is  used  to  generate  the  Information  then 
presentation  of  the  end-results  alone  Is  rarely  acceptable.  Host  medical 
researchers  will  want  an  explantlon  of  the  data  sources  and  algorithms 
that  led  to  the  output  results,  as  well  as  Information  about  the  expected 
reliability  of  the  final  values. 

These  requirements  Increase  the  volume  of  the  output  for  research 
studies,  so  that  printed  reports  dominate.  In  most  clinical  situations 
less  output  Is  used,  so  that  other  methods  may  be  practical.  We  have 
seen  the  following  alternatives: 

1.  Detailed  listings  or  rapid  video  presentations  for 
quick  scanning  of  data. 

2.  Cross  tabulations  or  graphics  to  aid  human  pattern  detection. 

3.  Well -structured  summaries,  with  automatic  data  selection  and 
advice  for  patient  cere. 

4.  Summaries  with  explanatory  backup  available  on  a  terminal 
when  needed. 

5.  Structured  report  presentation  for  outside  distribution, 
as  with  billing  or  result  publication. 

During  a  routine  patient  encounter  a  paper  summary  Is  probably  least 
distracting,  but  in  emergency  situations  video  terminal  access  can  be  much 
more  rapid.  Terminal  access  helps  the  researcher  In  the  formulation  of 
queries,  and  graphics  provide  Insight  to  clinicians  uncomfortable  with 
long  columns  of  numbers.  As  systems  mature  end  become  more  accepted  the 
user  should  be  able  to  move  smoothly  from  one  form  of  output  presentation 
to  another,  but  most  systems  now  In  use  do  not  provide  many  options  for 
data  presentation,  and  even  fewer  offer  a  smooth  transitlo*  between 
Interaction  modes. 
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I.O.S  Database  Administration 

Evan  Mhan  all  tha  right  decisions  have  been  made  and  a  database  exists, 
there  has  to  be  an  ongoing  concern  with  reliability,  adaptation  to 
changing  Institutional  needs,  planning  for  growth,  and  technical  updating 
of  the  facilities.  In  many  Institutions  a  new  function,  that  of  database 
administrator.  Is  defined  to  deal  with  these  operational  Issues.  The 
database  administrator  needs  strong  support  from  management  and  high 
quality  technical  assistance.  Since  the  function  Is  responsible  for 
day-to-day  operations  It  is  not  reasonable  to  expect  a  high  level  of 
innovation  from  the  database  administrator,  but  responsiveness  to  the 
institutional  goals  Is  essential. 
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II.  USE  OF  DATABASES  IN  HEALTH  CARE 

Now  that  the  concepts  and  operational  Issues  of  databases  In  health  care 
have  been  summarized  the  application  of  databases  In  health  care  settings 
can  be  brought  forward.  We  will  first  relate  database  uses  with  the 
categories  of  health  care  settings,  and  later  In  the  chapter  discuss 
specific  types  of  applications  In  greater  detail. 


II. A  Health  Care  Settings  and  the  Relevancy  of  Database  Technology 

The  obvious  area  of  application  of  database  technology  In  health  care 
Is  the  maintenance  of  patient  records.  These  medical  records  exist 
In  a  wide  range  of  health  care  settings,  and  effectiveness  of  databases 
for  their  management  depends  greatly  on  the  environment  [W1ed&K78]. 

II. A. 1  The  Solo  Practice 

The  private  solo  general  practitioner  finds  that  most  of  his  needs  for 
automation  are  satisfied  by  relatively  simple  systems  oriented  towards 
billing  and  schedule  keeping.  The  cost  of  entering  and  keeping  medical 
data  In  a  computer  does  not  now  provide  corresponding  benefits,  since  a 
paper  medical  record  can  be  kept  close  at  hand  [Cast1e74,  Rodn1c77].  The 
low  cost  of  small  processors  does  create  great  Interest  In  computer 
applications  by  physicians,  but  current  micro-processor  systems  do  not  yet 
provide  a  convenient  basis  for  the  development  of  programs  with  complex 
files  [Zimmer 79]. 

II. A. 2.  The  Group  Practice 

The  operation  of  a  group  practice,  where  several  physicians  and  paramedical 
personnel  cooperate  In  giving  care,  creates  some  problems  In  access  to 
medical  records.  Data  for  the  record  Is  generated  at  multiple  sites,  but  the 
entire  record  should  be  complete  and  legible  whenever  and  wherever  It  Is 
retrieved.  Continuity  of  care  can  be  greatly  aided  by  a  computer-based 
system  [Bres1a76].  Here  entry  and  storage  of  basic  medical  data,  diagnoses, 
procedures,  prescriptions,  and  follow-up  becomes  worthwhile. 

For  multi-user  operation  a  shared  database,  accessed  from  the  Individual 
health  care  sites,  provides  benefits  In  access  to  the  record  and  to  the  data 
contained  therein  [Zimmer 78].  A  group  may  also  wish  to  Integrate  Its  billing 
service,  and  the  shared  medical  record  can  provide  the  required  linkage 
[Worth78],  In  a  large  group  practice  or  health  maintenance  organization 
(HHO)  the  management  benefits  of  an  accessible  clinical  database  are  also 
considerable  [Gaus73,  Barnet79]. 

II. A. 3  Specialty  Practice 

A  specialty  practice  or  clinic  can  further  exploit  the  benefits  of  shared 
access  to  data.  Since  most  specialty  clinics  deal  with  long-term  or  chronic 
diseases  a  longitudinal  record  can  be  collected  on  the  patients,  and  such  a 
record  can  reflect  the  Individual’s  response  to  tests  and  treatments 
[Starme77],  In  long  term  care  the  ratio  of  effort  devoted  towards  diagnosis 
versus  treatment  decreases,  so  that  more  care  can  be  delivered  by 
paraprofesslonal  personnel,  and  here  well  organized  data  presentations  can 
be  especially  effective  [McDona77]. 
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Tha  data  management  of  a  specialty  practice  will  also  be  easier  to 
structure.  Whereas  In  a  general  practice  a  bewildering  variety  of  data 
has  to  be  accommodated,  a  specialty  practice  can  often  organize  their  data 
Into  standardized  flowsheets  [Fr1es74].  Such  tabular  data  representations 
are  not  only  simple  to  use,  but  are  also  significantly  easier  to  process. 

The  depth  of  specialized  experience,  acquired  In  specialty  practice, 
motivates  the  physicians  to  analyze  the  disease  and  treatment  processes. 

The  physicians  are  then  willing  to  deal  with  the  procedures  that  aid 
data  quality  maintenance,  which  are  otherwise  viewed  as  a  distraction 
from  patient  care. 

II. A. 4  The  Hospital 

A  hospital,  dealing  with  In-patients,  presents  an  entirely  different  set 
a  problems.  In  the  ambulatory  settings  discussed  above,  a  patient  Is  seen 
at  most  once  per  day  and  his  record  will  be  kept  active  In  the  clinic  for 
several  years.  A  stay  In  the  hospital  may  oly  last  a  few  days,  and  during 
that  period  data  entries  and  retrieval  requests  can  occur  within  minutes  of 
each  other.  The  active  time  frame  for  Inpatient  services  Is  hence  much 
smaller  than  the  time  frame  for  outpatient  services.  Data  Input,  processing, 
and  output  has  to  be  rapid,  but  data  are  not  retained  In  an  active  state 
over  long  periods.  The  benefits  of  a  Hospital  Information  System  hence  are 
mainly  due  to  the  communication  provided  through  a  shared  database 
[Watson77].  Rapid  communication  can  lead  to  reduction  In  length  of  stay 
and  minimization  of  redundant  diagnostic  orders.  When  the  database  Is  used 
like  a  blackboard,  as  a  communication  medium  by  by  the  treating  physicians, 

It  will  reduce  conflict  In  patient  treatment  procedures. 

Hospital-based  clinics  for  ambulatory  patients  have  of  course  the  features 
of  general  or  specialty  clinics,  and  Impose  their  own  requirements  on  a 
computer  system  that  Is  to  be  shared  for  both  hospital  functions.  Systems 
that  serve  both  functions  well  are  rare  [Collen74,  Aust1n78].  At  the  NIH 
Clinical  Center,  where  the  patients  are  referred  for  Inclusion  In  particular 
research  protocols,  work  Is  In  progress  to  create  a  database  system  for 
long-term  research  and  administrative  purposes  as  a  byproduct  of  the  on-line 
hospital  system  [Lew1s77]. 


II. A. 5  Clinical  Research 

An  Important  use  of  databases  Is  the  support  of  clinical  studies  [Palley75]. 
Both  In  a  controlled  clinical  trial  [Peto77,  511ver78]  and  In  open 
population  studies  [Fr1es72],  careful  management  of  data  Is  essential. 

When  the  studies  become  even  moderately  large  In  terms  of  population 
and  observation  period,  a  database  approach  becomes  essential.  We  find 
that  even  where  no  actual  database  management  system  Is  In  use  that 
well-defined,  programmed  or  manual  procedures  are  used  to  complement 
traditional  computing  facilities  with  database  tools,  such  as  data 
dictionaries,  generalized  query  programs,  periodic  backup,  and  data 
verification  processes.  Without  such  care  computerized  medical  records  will 
not  be  trustworthy  and  not  be  the  resource  for  research  they  appear  to  be 
[Fe1nst70]. 


II. A  Uses. Settings 
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II. A. 6  Non-Patient  Databases 

There  are  of  course  many  data  collections  relevant  to  medicine  which  do  not 
contain  patient  data.  Such  non-patient  databases  are  not  discussed  In  depth 
In  this  report,  but  they  constitute  an  Important  facet  of  database  usage  In 
health  care.  One  type  of  non-patient  databases  are  the  data  collections 
that  are  used  to  record,  monitor,  and  assess  the  effect  of  new  drugs  on 
animals  prior  to  their  release  for  clinical  trials  on  humans.  Substantial 
databases  exist  that  support  toxicology  studies  [Oxman76].  The  recent  rules 
on  ’Good  Laboratory  Practice*  by  the  Bureau  of  Drugs  of  the  FDA  specifically 
address  data  handling  requirements  for  toxicology  studies  [FDA78],  A 
databank,  the  Laratory  Animal  Data  Sank  (LADB),  has  been  developed  to  keep 
track  of  laboratory  animals  used  as  controls  for  such  studies.  This  work  Is 
dona  at  Batelle  Laboratories  under  sponsorship  of  a  number  of  agencies 
concerned  with  such  toxicological  testing.  The  National  Library  of  Medicine 
manages  the  program  as  one  of  Its  Specialized  Information  Services  [BCL7B]. 

Databases  are  also  used  to  Improve  the  management  of  health  care  education. 
Such  databases  contain  student  and  laboratory  data,  so  that  students  can 
receive  appropriate  assignments,  are  matched  to  the  patient  population,  and 
show  adequate  progress  [Duncan78,  Go78,  Kre1tz78]. 

Another  set  of  non-patient  databases  used  In  health  care  are  collections 
of  reference  Information  for  physicians  or  researchers.  Well  known  are 
the  poison-control  centers;  these  are  databases  that  relate  accidents 
Involving  dangerous  household  and  industrial  substances  to  appropriate 
treatment  [Yokel78,  00e1178],  The  organization  of  these  systems  approaches 
that  of  Information  retrieval  systems,  rather  than  of  databases,  since 
mainly  retrieval,  rather  than  data-processlng  and  data  analysis,  takes  place 
In  response  to  an  Inquiry. 
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II. B.  Current  Health  Care  Applications  of  Databases 

The  use  of  databases  Implies  the  availability  of  large  and  relevant 
quantities  of  data.  A  sufficiently  large,  complete,  and  accurate 
collection  of  data  Is  the  basis  for  achieving  believable  results  from 
database  processing.  Since  entry  of  large  data  volumes  Is  costly,  the 
Initial  uses  of  databases  have  been  In  areas  where  the  data  were  simple 
to  obtain  and  of  relatively  great  value.  Databases  have  been  used  In 
the  financial  areas  of  health  care  delivery  before  we  they  were  used  In 
the  clinical  areas,  and  databases  are  used  more  often  to  support  specific 
studies  than  In  general  health  care  delivery  situations  [OTA77,  Ste1&B78]. 
To  provide  Information  for  public  health  policy  decisions  data  may  be 
entered  Initially  at  high  levels  of  abstraction  before  the  systems  are  able 
to  compute  summarized  data  from  detailed  data  collections.  The  problem 
of  data  entry  has  been  discussed  earlier. 

The  effort  needed  to  bring  database  projects  Into  operation,  and  the 
Inherent  delays  before  results  can  be  produced  means  that  many  current 
database  oriented  projects  are  difficult  to  evaluate.  We  find  broad 
acceptance  today  of  database  technology  In  the  areas  where  databases  were 
Introduced  early,  but  also  that  some  of  the  ploneeerlng  projects  suffer 
today  from  being  bound  to  outdated  database  technologies  [Br1an79].  There 
appears  to  be  no  fundamental  barrier  to  acceptance  of  databases  In  medicine, 
but  there  Is  a  well  founded  "show-me"  attitude  which  can  be  overcome  by 
demonstration  of  adequate  operation  and  reasonable  effectiveness. 

II.B.l  Databases  Used  for  Service  Reimbursement 

The  largest  databases  In  use  that  are  related  to  health  care  are  no  doubt 
the  databases  associated  with  the  reimbursement  mechanisms  for  health 
services.  In  the  United  States  these  are  the  federal  Medicare  and  the  state 
managed  Medicaid  programs.  The  latter  are  generally  served  by  outside 
private  contractors;  since  the  contracts  are  awarded  on  the  basis  of  lowest 
cost  the  contractors  have  kept  the  medical  content  of  the  databases  as 
limited  as  possible.  Several  other  countries  maintain  databases  associated 
with  government  sponsored  health  care  delivery  [Anders77,  Hall77,  Nakaya77, 
Re1che77,  Sheple77].  Requirements  for  inquiry  and  audit  are  generating 
requirements  for  more  complete  medical  encounter  information.  This  leads 
the  processing  organizations  which  handle  reimbursement  accounting  to 
consider  database  technology,  although  much  of  the  work  today  Is  based  on 
periodic  processing  of  large  files  [KatzJR77].  That  these  databases  can 
provide  useful  Information  for  health  care  delivery  policy  has  been 
demonstrated  by  the  on-line  Medicaid  prescription  collection  system  In 
Alabama  [MeselW76]  where  Inappropriate  use  of  several  drugs  was 
demonstrated.  This  was  one  of  the  findings  which  lead  eventually  to  the 
recent  Implementation  of  restraints  of  prescribing  of  propoxyphene 
hydrochloride,  Darvon. 

II. B. 2  Disease-Specific  Shared  Databases 

There  Is  also  a  broad  Interest  In  the  capture  of  population  data  for 
those  diseases  that  are  so  prevalent  and^costly  that  national  concern, 
sometimes  In  the  form  of  disease-specific  legislation,  Is  focused  on 
them.  In  the  use  of  these  databases  Issues  of  policy  and  health  care 
delivery  are  Intertwined. 
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Disease  specific  databases  with  Important  health  policy  Implications  have 
been  supported  by  the  National  Cancer  Institute  (NCI)  [Haensz66]. 

One  program  (SEER)  [NCI74]  provides  surveillance  of  cancer  Incidence  and 
survival  In  nine  geographic  areas  around  the  country  [Young78].  A  more 
recent  program,  CCPDS,  Integrates  the  data  collected  at  the  comprehensive 
cancer  centers  to  evaluate  the  Impact  of  treatment  on  disease  patient 
groups  [Fe1g179].  Similar  broad-based  efforts  exist  In  psychiatry  (MSIS) 
[Logema73],  kidney  disease  (RENTRAN)  [M1shel76],  and  rheumatic  disease 
(ARAMIS)  [Hess74].  The  CCPDS  database  Is  here  described  as  an  example  of 
this  type  in  the  Appendix,  section  IV, A. 

Collecting  and  sharing  of  data  from  multiple  Institutions  requires  a  major 
effort  to  standardize  data  collection.  In  particular,  the  consistent 
encoding  of  observations,  already  difficult  within  one  Institution, 
becomes  a  major  problem  when  the  data  are  collected  from  multiple 
Institutions.  When  the  data  sources  are  separated  by  a  day's  travel  the 
difficulties  are  even  greater.  Broad  disease  classifications,  such  as 
ICOA-9  [WH078],  have  become  accepted,  but  are  Inadequate  within  any 
specialty.  A  comprehensive  schema  can  provide  a  common  definition  for  the 
data  elements  that  are  to  be  shared.  The  procedures  to  encode  data  within 
the  schema  may  be  made  particular  to  each  Institution  If  they  have 
differing  conventions  for  their  source  data  collection.. 

The  solution  provided  through  a  schema,  as  described  above,  cannot  overcome 
all  problems  of  Inter-Institutional  data  comparability.  Because  different 
Institutions  will  have  differences  In  patient  access,  the  subpopulatlons 
from  the  cooperating  Institutions  will  have  different  demographic 
distributions.  This  means  that  even  when  data  are  coded  In  a  consistent 
manner,  comparability  of  findings  from  different  Institutions  Is 
questionable.  In  general,  uncontrolled  pooling  of  multl-lnstltutlonal  data 
Is  to  be  avoided. 

To  support  such  shared  databases  a  good  communication  system  Is  required. 
Early  systems  had  to  build  their  own  communication  networks  [Logema73, 
Mesel75].  Now  commercial  companies  like  TYMNET  (used  by  MEDLARS,  LADB  and 
SUMEX)  and  TELENET  (used  by  ARAMIS  [McShan78J  as  well  as  by  MEDLARS)  can 
provide  such  services  on  a  nationwide  basis.  The  linkage,  to  a  network  also 
provides  a  potential  advantage  for  Investigators  who  wish  to  establish  a 
new  database.  They  can  shop  around  and  determine  which  database  system 
and  service  Is  best  suited  to  their  needs.  Once  a  large  data  collection 
Is  established  within  any  particular  system,  switching  service  suppliers 
Is  nearly  Impossible. 

When  remote  use  Is  made  of  a  central  computer  the  apparent  reliability  has 
to  be  as  high  that  achieved  by  local  systems.  That  means  that  the  real 
reliability  has  to  be  very  high.  The  apparent  reliability  Is  the  product 
of  many  factors:  reliable  electric  power,  reliable  central  hardware, 
reliable  and  easy  to  understand  software  at  the  central  site,  good 
communication  services  with  a  minimum  of  noise  and  inconsistencies,  and 
reliable  terminal  operation  at  the  users  site  [Fries7B].  The  effort  to 
provide  all  of  these  In  a  well  integrated  form  mandates  a  major  management 
effort,  which  adds  substantially  to  the  cost  of  a  shared  operation. 
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II. B. 3  Databases  In  Health  Maintenance  Organizations 

The  larger  health  care  delivery  organizations  have  needs  for  management  and 
policy  setting  as  well  as  requirements  due  to  medical  services.  A  prepaid 
health  plan  or  health  maintenance  organization  (HMO)  has  to  set  rates  that 
allow  It  to  be  competitive  while  generating  sufficient  Income  to  cover  the 
various  services.  The  total  cost  of  services  will  depend  to  a  large 
extent  on  the  population  being  served.  Workers,  families  with  children, 
the  Indigent,  and  the  elderly  all  present  different  service  patterns.  Good 
medical  accounting  practices  and  reliable  data  are  Important  to  determine 
equitable  rates.  The  outstanding  example  of  such  an  operation  is  seen  at 
the  Harvard  Community  Health  Plan  [Just1c74],  where  a  MUMPS-based  system, 
called  COSTAR,  provides  services  to  two  sites,  one  with  about  20,800 
patients.  A  generalization  of  this  system,  using  schema-like  parameters 
[Zimmer 78],  will  be  described  In  section  IV. D. 

The  data  In  the  COSTAR  database  are  used  In  the  clinic  to  provide  a  printed 
abstract  of  the  medical  record  prior  to  every  encounter  and  are  available 
for  on-line  inquiry.  An  associated  schedule-keeping  program  provides  the 
Information  to  allow  most  of  these  abstracts  to  be  printed  at  night,  so 
that  the  scheduled  patient  encounters  no  delay  for  record  delivery. 

Encounter  data  are  collected  by  the  physicians  on  forms  that  are  preprinted 
with  preblem-apeelf ie  ehnek  lists,  and  entered  subsequently  by  clerical 
personnel  Into  the  computer  files.  A  small  amount  of  free  text  can  be 
recorded  to  cover  situations  where  the  check  list  Is  inadequate.  The 
hierarchical  structure  of  the  MUMPS  file  system  Is  well  matched  to  the 
patient  oriented  view  of  such  a  clinical  system: 

A  patient  Is  seen  as  having  problems, 

for  which  the  patient  Is  seen  by  specialists  In  the  clinic, 

an  encounter  results  In  a  number  of  observations 

and  treatment  specifications, 

which  may  In  turn  have  a  number  of  data  elements, 

such  as  drug  name, 

with  dosage,  frequency,  and  duration. 

This  hierarchical  view  was  already  used  In  the  earliest  systems  for  HMO 
support  [Dav1s70].  In  data  model  terms  we  have  a  nest  of  drugs  within 
a  nest  of  encounters  within  a  nest  of  problems  for  each  patient  entity. 

For  management  functions,  clinical  as  well  as  administrative,  the  access 
to  COSTAR  data  Is  Indirect.  One  reported  Instance  In  clinical  management 
was  the  need  to  prepare  a  list  of  patients  which  had  received  a  certain 
IUD  -  which  was  reported  to  be  potentially  harmful.  For  this  task  the 
file  has  to  be  searched  patient  by  patient.  Such  processes  are  typically 
done  overnight,  In  effect  In  batch  mode.  The  overnight  delay  Is  certainly 
tolerable  when  reports  that  Identify  or  aggregate  many  Individuals  have  to 
be  produced.  Administrative  management  Is  In  fact  accustomed  to  much 
longer  delays  In  conventional  data  processing  operations,  where  special 
reports  can  only  be  produced  as  a  byproduct  of  regular  periodic 
processing.  Guidelines  for  the  financial  management  content  of  these 
databases  have  been  Issued  by  HEW  [Densen72,  Gaus73]. 

Issues  to  be  resolved  In  the  economist’s  arena  sre  the  trade-offs  In  an  HMO 
between  preventive  care  and  restorative  care,  task  assignment  to  physicians 
versus  paraprofesslonals,  end  the  effectiveness  of  incentives  for  proper 
utilization.  The  analyses  must  give  proper  consideration  to  the  patient 
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population  that  Is  receiving  services  [Hershe79,  Luft78].  The  data  gathered 
In  the  database  used  within  an  HMO  type  operation  can  do  much  to  resolve 
questions  of  economic  concern.  Several  broad-based  systems  exist  and  have 
provided  valuable  data.  For  Instance,  the  health  services  provided  to  a 
largely  Indigent  population,  comprised  of  the  Indians  on  several  reservations, 
have  been  recorded  In  a  computer  system  by  the  Indian  Health  Service  In 
Tucson,  Arlzone  [McArth78].  The  data  have  provide  Information  for  health 
care  resource  allocation. 

Services  to  poor  populations  are  provided  by  the  federal  and  state 
governments  through  a  bewildering  array  of  categorical  grants  for  health 
services.  In  South  Carolina  a  system  Is  In  operation  which  was  largely 
motivated  by  the  need  to  coordinate  these  various  grants  [Pen1ck76]. 

By  matching  a  patients  needs  to  the  available  services  some  degree  of 
comprehensive  care  may  be  achieved.  Here  a  multi-dimensional  view  of  the 
database  was  implemented  using  a  network  type  database  system,  IDHS. 

Many  of  these  systems  define  their  Initial  database  content  using  the  result 
of  a  comprehensive  study  [Murnag73]  of  data  elements  for  ambulatory  care. 

The  results  were  Issued  as  a  guideline  by  HEW  [NCHS74].  A  goal  of  the 
study  was  to  achieve  a  consistent  database  useful  for  health-care  policy 
decisions.  While  this  study  did  not  address  health  plans  In  particular,  It 
Is  only  In  this  arena  that  we  find  consistent  population-based  data 
collection  [L1nber78],  In  some  European  countries  [Gremy77,  Hal  177,  Mase77] 
where  health  services  are  largely  provided  with  government  assistance, 
regional  databases  have  been  Implemented,  although  their  medical  content 
is  often  quite  shallow. 


II. B. 4  Surveillance  Databases 

Surveying  the  status  of  a  patient  population  and  the  generation  of 
appropriate  advice,  either  to  health  care  personnel  or  to  the  patient,  may 
be  an  Important  function  using  a  database.  A  simple  database  which  Is 
oriented  towards  that  function,  CIS,  Is  In  operation  at  Regenstrief 
Institute  In  Indianapolis,  serving  a  number  of  chronic  disease  clinics 
[Bharga74,  McDona75], 

In  the  diabetics  clinic  of  Regenstrief  Institute,  where  many  of  the  services 
consist  of  of  laboratory  tests  and  the  advice  Is  often  provided  by 
paraprofessionals,  automated  surveillance  to  track  the  patients  status  is 
especially  effective.  A  special  language  has  been  developed  so  that  the 
algorithms  or  rules  are  easily  written.  The  vocabulary  consists  of  the 
clinical  terms  embodied  in  the  schema.  The  output  of  the  system  Includes 
printed  advice  statements,  as  well  as  fields  for  data  elements  to  be 
recorded  during  the  current  visit.  The  sequentially  organized  file  Is 
passed  rapidly  against  the  rule  base.  Maintenance  of  the  clinic  visit 
schedule  Is  also  provided.  Upon  leaving  the  patient  receives  the  copy  of 
the  advice  report,  which  Is  based  on  the  findings  up  to  the  current  visit 
[McDona76].  Each  visit  becomes  a  new  entry  in  the  database.  This  clinic 
provides  an  excellent  environment  for  computer-based  assistance.  Since 
the  objectives  of  such  a  database  are  Initially  defined,  relatively  complete 
data  collection  can  be  attempted.  Most  patients  wil  1  cooperate  with  the 
goals  of  the  clinic  and  the  requirements  of  Its  system.  Encounters  are 
scheduled,  and  data  processing  can  be  performed  either  Immediately  after 
or  prior  to  the  encounter.  The  files  can  be  organized  for  efficient  batched 
processing . 
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An  Important  application  of  such  a  system  Is  also  seen  In  the  management 
of  patients  on  chemotherapy  [W1rtsc78].  Here  the  use  of  protocols  for 
physicians  at  a  number  of  remote  sites  permits  an  Identical  therapy  regimen 
to  be  provided.  The  consistency  Is  exploited  here  for  In  controlled  trials, 
where  variations  can  make  the  results  of  the  trial  unusable.  An  evalutlon 
has  shown  that  difficult  protocols  are  followed  much  better  when  the  system 
Is  used.  Eventually  It  may  be  feasible  to  Interact  with  community  physicians, 
sparing  the  patients  the  disruption  of  travel  and  Integrating  the  primary 
and  tertiary  care  system. 

The  aggregate  database  also  serves>an  Important  function  as  a  collection 
of  experience  In  treatment  effectiveness  as  well  as  patient  behavior 
[S1mbor76].  Surveillance  Is  of  course  not  just  limited  to  patients,  but  can 
also  be  employed  to  track  the  behavior  of  the  physicians  or  the  clinic 
operation.  Quality  assurance  surveillance  can,  for  Instance,  determine  If 
hypertensives  are  monitored  and  that  patients  with  positive  tests,  say  for 
strep-throat,  are  being  treated  appropriately  [Barnet78].  Such  surveillance 
will  Improve  data  recording  as  well  as  patient  care.  Data  can  only  be 
relied  upon  If  regularly  used,  and  conclusions  should  not  be  drawn  from  data 
which  have  not  beer  verified  by  use  or  by  audit.  An  Investigation  of  the 
operation  of  a  clinical  surveillance  showed  that  Incomplete  record  keeping 
was  responsible  for  many  of  the  findings  of  lack  of  patient  follow-up. 


XI. B. 5  Specialty  Clinical  Databases 

It  Is  In  the  speciality  clinics  that  databases  have  had  the  most  medical 
Impact.  An  extensive  survey  [Hen1ey75],  on  which  many  of  the  observations 
made  here  are  based,  found  an  effective  database  operation  In  a  private 
cardiology  clinic  In  Oklahoma  City  [W11son78].  The  system  was  written 
largely  by  one  medical  student  using  MUMPS  and  maintains  patient  status 
data  for  routine,  monitoring,  and  emergency  visits.  The  system  is 
also  used  to  help  In  scheduling  appointments  and  the  combination  of  both 
files  provides  data  for  billing  purposes.  A  second  example  of  a 
speciality  database  is  In  an  academic  rheumatology  clinic  at  Stanford 
University,  using  TOD  [Weyl75],  where  signs  and  symptoms  of  a  new  patient 
can  be  compared  with  those  of  treated  patients.  The  comparision  generates 
a  prognosis  of  treatment  effectiveness  in  the  new  case.  The  database 
structure  of  these  two  systems  differs  dramatically:  the  former  reduced  the 
patient’s  past  history  to  a  concise  snapshot  for  easy  review,  whereas  the 
latter  maintains  a  detailed  time-oriented  history  for  analysis.  The 
cardiology  clinic  at  Duke  also  relies  on  an  Integrated  universal  patient 
record  [Starme74,  Rosat175],  whereas  a  MUMPS-based  diabetes  system  at 
Washington  University  records  visits  over  time  [Achten75].  We  conclude 
from  these  examples  that  the  database  model  of  the  users  Is  determined  by 
their  medical  view,  rather  than  by  the  facilities  provided  through  the 
database  system. 

A  strong  emphasis  on  follow-up  and  time  markers  was  noted  In  a  specialized 
minicomputer  database  which  Is  oriented  towards  the  tracking  of  major 
workman’s  compensation  cases  [Leav1t72].  An  important  economic  incentive 
exists  here:  since  the  Insurance  company  reimburses  the  wages  lost  during 
the  recovery  period  It  can  Invest  In  aggressive  follow-up  and  medical 
treatment  In  order  to  reduce  Its  total  cost  per  case.  Such  motivation  Is 
lacking  In  Insurance  systems  which  reimburse  for  treatment  only. 
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II. B. 6  General  Clinical  Databases 

Many  of  the  services  to  outpatients  that  are  provided  by  specialty  clinics 
are  equally  appropriate  to  outpatients  seen  In  hospital  associated  general 
clinics  or  In  major  group  practices.  In  large  Institutions  there  Is  also 
a  need  for  coordination  of  services  carried  out  by  various  units,  which 
may  be  physically  or  intellectually  remote  from  each  other.  These  larger 
institutions  have  had  a  need  for  billing  their  costs  to  the  patients  and 
to  the  patient’s  Insurance  companies,  so  that  basic  records  of  patient 
visits  are  already  maintained  on  computers.  The  reason  for  the  visit  also 
should  be  recorded  to  assure  that  the  encounter  qualifies  for  reimbursement. 
The  upgrading  of  a  system  from  those  beginnings  to  Include  a  medically 
relevant  database  Is  not  at  all  easy,  and  the  motivation  for  the  Inclusion 
of  medical  experience  is  less  In  a  general  outpatient  clinic  than  In  a 
specialty  clinic.  In  a  general  outpatient  clinic  there  Is  typically  less 
shared  responsibility  for  care,  and  the  patient  population  Is  more  diverse 
and  less  committed  to  follow-up.  In  non-academic  institutions  we  find 
databases  with  little  structure  beyond  the  billing  date.  AUTOMED  In 
Cleveland,  which  serves  private  practices,  permits  addition  of  free  form 
textual  messages  under  several  categories  in  each  visit  and  provides  a 
keyword  oriented  search  capability.  The  structuring  of  data,  such  as  the 
selection  of  keywords,  the  definition  of  appropriate  values,  or  the 
creation  of  access  capability  to  the  message  text  for  analysis,  is  left  to 
the  physician-user.  A  few  have  used  the  system  well,  but  most  restrict 
their  use  to  pre-programmed  billing  services  [Wied&M75]. 

Billing  and  usage-oriented  data  can  contain  information  of  interest. 

Records  of  insurance  charges,  when  processed  regionally,  have  provided  a 
handle  for  the  selection  of  cases  for  peer  review  [Buck74].  Data  on  usage 
patterns  from  the  Kaiser-Permanente  system  has  provided  information 
important  to  health  care  system  planners  [Enthov78].  On  a  smaller  scale, 
in  a  family  practice  clinic  a  system  using  MUMPS  provides  feedback  in 
regard  to  patient  visit  management  to  physicians  in  training  [Given77]. 

The  medical  record  itself  Is  collected  by  transcribing  dictation  into  the 
database.  Medical  data  in  this  form  are  not  easy  to  analyze  for  thends 
across  the  population.  Since  there  is  a  pharmacy  on  the  premises,  which 
does  collect  numeric  data,  analysis  about  drug  prescribing  and  drug  use, 
or  at  least  about  drug  purchase  patterns,  is  possible. 

The  complexity  of  data  collection  In  general  practice  Is  demonstrated  by  the 
multi-page  coding  form  which  was  used  at  CHCP,  the  HMO  sponsored  by  Yale 
University  [Brunje71].  Few  data  items  were  actually  recorded  per  visit. 

Some  structured  organization  of  the  medical  record  to  simplify  data 
collection  appears  neccessary.  The  problem-oriented  record  structure 
[Weed71]  interacts  well  with  well  defined  medical  data  management  [Hall76], 
even  if  the  rigor  preferred  by  Dr.  Weed  [Weed78]  Is  not  maintained. 

A  comprehensive  guide  to  computing  in  medical  practice  settings  Is 
provided  in  "Computers  for  the  Physician  Office"  [Zimm&R78].  This  volume 
also  provides  addresses  of  organizations  and  journals  which  are  active 
In  this  area. 


II. 6. 7  Databases  In  Research 

Clinical  databases  are  also  Important  resources  for  research.  Searching 
for  cases  through  paper  medical  records  and  then  abstracting  data  from 
them  Is  tedious  and  costly.  When  a  database  Is  available  the  collection 
effort  has  been  carried  out.  Feedback  from  the  use  of  the  data  In  patient 
care  should  have  verified  the  correctness.  Selection  of  cases  and  data  Is 
now  carried  out  easily,  making  such  a  clinical  database  an  especially 
useful  resource.  Since  the  patients,  and  the  data  collected  on  these 
patients,  were  not  selected  with  the  research  objective  In  mind,  some  care 
has  to  be  taken  In  research-oriented  analyses.  The  conditions  under  which 
patients  were  entered  Into  the  database  have  to  be  understood,  so  that 
problems  of  selection  bias  can  be  dealt  with. 

For  purposes  of  statistical  analysis  a  rigid,  tabular  format  is  preferred, 
and  encoding  of  observations  Is  essential  If  problems  of  differing 
terminology  are  to  be  avoided  [S1b1ey77],  Data  to  be  encoded  Include 
diagnoses,  subjective  findings,  stages  of  disease,  and  patient  demographic 
characteristics.  Many  encoding  schemes  have  been  used;  they  are  often 
locally  developed  to  serve  particular  anticipated  analysis  needs  [Ste1&M78]. 
In  general  usage  for  disease  and  problem  classification  Is  the 
International  Classification  of  Disease,  Adapted  [WH078]  and  the  coding 
scheme. for  ambulatory  care  of  the  Royal  College  of  General  Practloners 
[RCGP72].  Multi-dimensional  coding  schemes  have  been  developed  such  as  the 
Systematic  Nomenclature  Of  Pathology  (  SNOP  ),  with  code  definitions  for 
■ash  of  four  dimensions  :  (topography,  morphology,  etiology,  function) 
[ACP09],  This  coding  was  later  expanded  for  medical  care  with  the 
dimensions  (diseases,  procedures)  Into  SNOMED  [Cote77].  Speciality  areas 
are  developing  their  own  versions.  The  Academy  of  Dermatology  has  now 
published  SNODERM  with  codes  specific  to  dermatology  In  these  six 
dimensions  [AA0078].  The  topic  of  codes  Is  a  major  field  of  research  In 
Itself,  and  we  will  have  to  forego  further  exposition  here.  The  actual 
encoding  of  clinical  observations  for  computer  entry  can  be  carried  out  In 
a  variety  of  ways.  Common  alternatives  of  encoding  data  for  computing  were 
presented  In  section  I.D.l. 

Clinical  databases  may  take  many  forms,  but  since  In  general  a  cause  and 
effect  relationship  Is  to  be  explored,  we  will  need  to  collect  patient  data 
that  represent  at  least  both  events.  The  event  which  is  the  cause  should 
precede  the  event  which  shows  the  effect  by  some  Interval  of  time.  If  the 
Interval  Is  short,  then  a  single  patient  observation  may  suffice:  studies 
of  effects  of  different  treatments  In  emergency  care  have  been  of  that  type 
[Slosbe78].  When  data  Is  abstracted  from  the  medical  record,  then  the  past 
history  Is  available  from  the  patient  record  and  can  be  combined  with 
outcome  variables  Into  a  single  record.  When  the  cause  Is  unknown,  but 
possible  correlates  of  the  cause  are  observed,  then  a  single  snapshot  may 
also  do:  an  Impressive  example  Is  the  correlation  of  US  geographical  sites 
with  cancer  Incidence  [Breslo75].  The  actual  cause  of  the  variability  of 
cancer  Incidence  Is  left  to  further  hypotheses.  Verification  of  such 
hypotheses  may  require  new,  and  more  extensive  databases. 

In  general  the  chance  of  capturing  cause  and  effect  with  one  snapshot  is 
rare.  In  most  situations  the  appearance  of  the  outcome  event  Is  delayed 
and  unpredictable,  so  that  the  patient  progress  has  to  be  observed  and 
kept  over  a  long  time  period.  In  the  more  general  databases  many  visits 
for  diagnosis  and  follow-up  treatment  have  to  be  recorded  and  the  time  of 
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obsarvatlon  becomes  a  critical  varlabla.  Thasa  longitudinal  databasas 
become  mere  complex  and  oftan  contain  missing  data  [Gree&K77].  Certain 
ovants  In  the  patients  history  are  milestones  for  tine  measurements,  for 
example  disease  onset,  start  of  traatment,  and  racognltllon  of  disease 
stages.  Data  has  to  be  matched  appropriately  to  thasa  milestones  for 
proper  comparlslon  and  statistical  analysis.  An  approach  to  deal  with 
such  problems  will  be  described  In  Section  IV. C. 

The  demands  made  by  researchers  on  databases  are  oriented  towards  rapid 
access  to  large  quantities  of  data.  The  resulting  databases  are  often 
relatively  simple,  and  data  has  to  be  extracted  from  clinical  databases 
Into  these  research  databases  prior  to  analysis.  The  research  database  Is 
then  not  kept  up-to-date,  but  Its  stability  permits  comparlslon  of  results 
using  multiple  analysis  techniques.  Database  systems  that  are  designed 
specifically  to  support  research  are  now  becoming  available  commercially 
or  from  public  Institutions  and  universities.  These  systems  typically 
provide  schemas  to  simplify  raferanclng  of  the  stored  deta  elements  for 
retrieval  of  subsets,  and  convenient  linkage  for  statistical  analysis. 

Host  available  database  systems  of  this  type  have  the  capability  for 
accepting  Incremental  updates,  but  do  not  have  the  full  Integrity  protection 
and  recovery  mechanisms  associated  with  the  large  database  systems  designed 
to  support  on-line  operational  service  requirements.  Examples  of  such 
database  systems  are  RAMIS  (  provided  by  Mathematics  of  Princeton,  NJ), 

MISAR  [Karp1n71],  WISAR  [Fr1edm77],  and  RS/1,  a  derivative  of  PROPHET 
(provided  by  BB&N  of  Cambridge,  MA).  The  availability  and  usage  of  these 
systems  Is  Increasing,  since  the  provide  good  database  capabilities  at  a 
lower  Installation  cost  than  locally  developed  research  support  systems. 


II  .C  FUTURE  USE  OF  DATABASES  IN  HEALTH  CARE 


Thera  seems  little  doubt  that  usage  of  the  various  types  of  databases  cited 
above  will  continue  to  grow,  even  though  the  number  of  successful  database 
operations*  Is  still  quite  small.  In  many  current  systems  the  breadth  of 
usage  and  the  degree  of  medical  Interaction  Is  much  less  than  was 
hoped  for.  As  systems  develop  so  that  they  hold  a  wider  variety  of  data 
and  become  More  capable  In  converting  these  data  to  Information,  they  will 
become  more  attractive  to  the  the  users.  Integration  of  proven  services 
and  their  Implementation  within  clean  and  reliable  system  approaches 
should  remove  some  of  the  barriers  which  now  deter  acceptance  of 
Interesting,  but  Isolated  systems.  Conceptual  advances  will  depend  on  how 
well  current  research  can  resolve  some  Important  points,  which  are  at  a 
level  beyond  the  simple  technical  Issue  of  building  systems  that  carry  out 
the  defined  tasks. 

II. C. I  Cost-effectiveness  Issues 

Economic  considerations  are  a  main  driving  force  in  the  proliferation 
of  databases.  Escalating  health  care  costs  focus  attention  on  the  health 
care  system.  Although  technology  Is  frequently  Identified  as  one  of  the 
factors  which  drive  costs  higher,  the  cases  cited  seem  to  be  instances 
where  excessive  equipment  has  been  obtained  and  Is  Idle.  There  appear  to 
exist  few  solutions  where  a  reduction  of  active  technology  will  reduce 
health  care  cost.  Medical  records  and  databases  are  rarely  Identified  as 
being  redundant  or  Inappropriate.  A  case  could  actually  be  made  against 
extensive  automated  medical  records  for  non-hospital ized  patients  that  do 
not  have  any  chronic  disease.  For  an  acute,  self-limiting  disease  In  an 
otherwise  healthy  person  the  physician  gains  little  from  the  historical 
medical  record  In  any  of  Its  forms.  As  our  population  becomes  older  the 
fraction  of  completely  healthy  people  diminishes,  so  that  substantial 
medical  databases  remain  warranted. 

The  Information  from  databases  Is,  of  course,  also  vital  to  the 
decision-making  and  planning  processes  in  health  care  administration 
[Brande76,  Hoscov77]  and  such  planning  Is  often  directed  to  health  care 
cost  control.  Since  technology  continues  to  lower  the  cost  of  the 
components  of  database  systems,  while  most  other  health  care  Items  get 
more  expensive,  we  can  expect  further  expansion  of  databases  In  health  care. 

II. C. 2  Initiatives  and  Innovation  Due  to  Technology  Push 

The  decreasing  cost  of  microprocessors  and  computer  memories  has  been 
well  documented.  The  mass  storage  devices,  disks  as  well  as  new  solid  state 
technologies,  are  not  far  behind.  A  new  and  relevant  storage  medium  Is 
provided  by  magnetic  domain  or  bubble  memories.  Advances  In  programming 
and  systems  design  may  be  less  obvious  but  are  equally  Important  to 
database  technology. 

The  experience  being  gathered  now  in  selected  projects  will  provide  the 
filter  for  new  Ideas  In  programming  and  software.  Such  a  filter  Is  needed 
since  there  are  already  now  many  technological  choices  for  medical  databases. 
Data  entry,  for  Instance,  can  be  accomplished  using  any  of  a  wide  variety  of 
methods,  such  as:  dictation  and  transcription,  free  text  typing,  forms  that 
permit  easy  encoding  of  the  data,  and  selection  of  Items  from  menus  presented 
on  a  display  screen.  Only  the  last  two  options  have  been  accepted  by 
physicians  In  a  clinical  environment  [Just1c74,  Schu1t76,  Watson77]. 
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Rapid  Interaction  Is  required  to  present  menus  on-line,  and  providing  the 
required  file  access,  processing,  and  communication  speeds  have  been  quite 
costly.  No  generalized  database  system  of  the  current  generation  can 
present  menus  at  the  rate  that  physicians  can  use  them.  This  has,  In  turn. 
Inhibited  broader  use  of  menu  selection  In  medicine.  Since  the  problem 
appears  clear,  a  search  for  a  technological  solution  can  be  made.  Prehaps 
the  menus  can  be  distributed  In  a  more  direct  manner  to  the  terminals,  so 
that  the  needed  performance  may  be  achieved  at  lower  file  and  communication 
cost.  Hardware  that  provides  Improved  display  distribution  Is  feasible,  and 
when  Introduced  Into  database  systems,  menu  selection  methodology  may  become 
dominant. 

The  example  cited  above  Is  but  one  Instance  where  we  expect  that  the 
availability  of  new  technology  will  encourage  new  system  development 
and  wider  dissemination  of  medical  databases  In  general.  Since  data  entry 
Is  such  a  bottleneck,  improvements  In  this  area  are  apt  to  trigger  new 
database  application  efforts.  Automatic  scanning  of  typed  documents  and 
computer  controlled  management  of  video  Images  are  other  technologies  that 
are  apt  to  affect  medical  data  management  soon. 

Developments  In  software  technology  can  have  a  similar  effect.  There 
are  already  many  ways  of  Information  structuring:  categorized  free 
text  [Lyman76],  hierarchically  structured  data  [Greene70],  sequential 
files,  relational  databases  [Mcleod75],  and  network  structured  databases 
[Pen1ck76].  No  clear  picture  has  yet  emerged  on  which  alternative,  If 
any,  will  dominate  [W1eder76].  As  our  understanding  of  Information 
structures  Increases  we  will  apply  this  understanding  to  medical 
databases.  Choices  can  then  be  made  on  a  rational  basis.  To  apply  new 
software  technology  to  medical  databases  requires  well-trained  personnel. 
Some  of  the  effort  can  be  obtained  Indirectly:  the  use  of  commercial 
database  products  In  medicine  takes  advantage  of  pre-existing  tools.  Their 
application  still  requires  medically  oriented  expertise.  Both  the 
complexity  and  the  Importance  of  health  care  provide  the  needed  Incentive 
to  obtain  the  attention  of  database  oriented  scientists. 


II. C. 3  The  Human  Element 

No  advances  can  be  made  unless  there  Is  a  cadre  of  knowledgeable 
Individuals.  Traditionally  there  has  been  a  lack  of  competent  people 
who  could  Integrate  medical  and  computing  knowledge.  Critical 
decisions  affecting  the  medical  Interface  have  been  delegated  to 
technicians.  We  do  see  a  change  occurring.  Increasing  numbers  of 
medically  oriented  computer  science  students,  microprocessor  owners 
[Orosz78],  and  physicians  with  Insight  Into  automation  will  provide  the 
resource  for  Implementation  and  dissemination  of  computer  applications  In 
medicine.  Training  programs  In  Medical  Information  Science,  often 
sponsored  by  the  National  Library  of  Medicine,  are  in  operation  at  several 
Institutions.  As  scientists  with  the  necessary  broad  scope  of  expertise 
become  available,  the  technology  and  the  techniques,  many  of  which  exist 
already  now,  can  be  appropriately  employed.  With  better  cooperation 
between  medical  and  computer  scientists  we  can  also  expect  that  the  Impact 
of  database  systems  will  effect  medical  care  In  more  direct  manner,  and 
that  new  applications  for  databases  will  be  found. 
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As  the  Interfaces  to  the  systems  become  friendlier,  and  the  scope  of 
available  services  becomes  more  clinically  relevant,  direct  Interectlon 
with  the  systems  will  become  more  rewarding  for  physicians.  There  will  be 
less  reason  to  deal  with  these  systems  through  Intermediaries.  The 
physicians  demands  will  In  turn  provide  the  Impetus  for  further  system  and 
Interface  Improvements. 

II. C. 4  Sharing  of  Information 

Sharing  of  knowledge,  data,  and  database  technology  Is  essential  to 
progress.  Dissemination  of  knowledge,  in  the  form  of  research  results, 
occurs  traditionally  via  publication.  The  communication  networks,  created 
originally  to  connect  people  to  computers,  and  computers  to  each  other,  are 
beginning  to  provide  a  faster  and  more  effective  path  for  knowledge 
sharing  among  scientists.  This  communication  occurs  through  fairly 
Informal  messages  or  eugrams  [Lederb78],  sent  over  the  network  when  the 
sender  has  an  Idea  or  need  for  some  knowledge,  and  are  read  by  the 
receiver  when  he  has  the  time  and  frame  of  mind  to  deal  with  external 
Inputs.  The  files  used  to  store  these  snippers  of  scientific  communication 
do  not  use  deep  database  technology,  but  logs  of  these  eugrams,  covering  a 
specific  topic,  are  Interesting  repositories  of  knowledge  development. 

Databases  for  the  sharing  of  data  are  more  common  than  computerized  sharing 
of  knowledge.  Data  collected  at  multiple  sites  Is  integrated  by  some  of  the 
disease-oriented  database  systems,  since  a  single  site  may  not  be  able  to 
collect  sufficient  data  for  analysis.  Clinical  Institutions  may  also  be 
limited  In  terms  of  research  interests,  so  that  a  shared  database  can  make 
the  collected  data  accessible  to  researchers  at  a  variety  of  sites.  Shared 
access  for  researchers  has  been  supported  by  the  PROPHET  system,  a 
national  facility  for  pharmacological  research.  The  system  provides  strong 
hardware  and  software  support  for  tabular  data,  computational  analysis, 
and  graphical  output  presentation  C Ransl 174].  This  central  computer  system 
allows  sharing  of  the  files  containing  drug  data  as  well  as  sharing  of  the 
computational  facilities  to  process  the  data  [Cast&W74],  Data  that  are  to 
be  shared  require  shared  definitions  of  the  attributes  to  assure  consistent 
collection  and  encoding.  Lesion  sizes  and  staging  In  cancer,  as  needed  for 
the  cancer  CCPDS  database,  are  examples  of  difficult  data  definitions. 
Comprehensive  schemas  can  help  if  they  make  the  definitions  available  on 
the  display  screen  during  data  entry.  The  communal ity  of  disease  and 
treatment  codes,  enforced  due  to  the  spread  of  reimbursement  controls, 
also  has  benefits  here. 

Data  stored  at  a  central  site  is  typically  not  used  for  ongoing  clinical 
activities.  Concerns  about  reliability  and  local  responsibility  Inhibit 
such  usage,  and  administrators  also  fear  loss  of  control  over  data  Important 
for  daily  operation.  Networking  and  other  forms  of  inter-database 
communications  can  Increase  the  number  of  systems  accessible  for  sharing 
of  data.  We  can  foresee  systems  organized  so  that  data  Is  kept  and 
maintained  locally,  but  where  remote  access  for  research  Is  enabled.  Then 
research  projects  can  aggregate  or  compare  observations  from  distinct 
health  care  service  centers.  Access  to  data  at  remote  centers  may  not  be 
as  fast  as  local  access,  but  should  be  as  easy. 
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Distribution  of  systems  can  make  new  technology  accessible  to  sites  of 
similar  Interest.  Distribution  of  COSTAR  [Barnet79],  CLINFO  [Thomps76], 
the  Duke  Cardiology  system  [Starme75],  avoids  reinvention  and  reprogramming 
of  complex  software.  As  an  Important  byproduct  such  system  sharing  creates 
human  networks  of  common  Interest  and  terminology. 

In  general  the  hope  Is  for  a  symbiosis.  Sharing  of  data  will  require 
more  adherence  to  standard  definitions.  More  standardization  will  make  It 
easier  for  researchers  and  analysts  to  compare  their  results  and  to 
validate  each  others  findings.  There  have  been  only  a  few  databanks  that 
have  been  analyzed  outside  of  the  Institution  which  originally  collected 
them,  so  that  the  scientific  maxim  of  repeatability  of  experiments  has 
rarely  been  demonstrated  In  the  database  field.  Repeatability  Is  necessary 
for  scientific  concensus.  Broader  agreement  of  scientific  Issues  In 
health  care  should  simplify  the  establishment  of  effective  health  care 
pol Icles. 


II. C. 5  Privacy  In  Databases 

A  continuing  concern  In  medical  databases  Is  the  guarantee  of  adequate 
privacy.  An  argument  often  made  by  technologists  Is  that  the  methods  of 
keeping  the  paper  record  are  not  very  secure,  but  this  not  an  excuse  for 
Inadequate  access  control  In  databases,  since  records  that  are  searchable 
by  computer  are  much  easier  misused.  Much  medical  Information  Is 
distributed  by  the  health  Insurance  system,  since  reports  of  physical 
exams  shared  among  Insurers.  The  technology  exists  today  to  make  access 
to  computer  files  very  secure  [DeM11178].  For  instance  physicians  notes, 
which  would  concern  only  a  few  database  users,  can  be  kept  In 
cryptographically  encoded  form.  Then  access  Is  restricted  to  parties  which 
have  been  given  the  decoding  key,  typically  some  easily  remembered 
sentence.  Such  encrypted  Information  Is  not  processable  by  computer, 
although  text  can  be  filed  and  moved  among  computers. 

Who  should  have  access  to  what  portion  of  computerized  medical  records  Is 
an  open  Issue.  Since  a  database  program  can  select  Individual  data 
elements  by  name,  access  can  be  made  much  more  specific  than  Is  possible 
with  paper  records  or  file  systems  that  do  not  differentiate  between  data 
elements.  A  study  sponsored  by  the  Society  of  Computer  Medicine  has 
provided  as  a  tentative  guidelines  a  matrix  of  access  privileges  to 
the  basic  ambulatory  data  elements.  The  accessors  considered  are  health 
care  providers,  financial  agencies,  health  care  planners,  medical 
researchers,  lawyers  who  represent  the  patient,  and  employers  [Jelovs79]. 

The  only  element  to  be  denied  to  the  provider  Is  the  social  security  number. 
The  extent  to  which  different  categories  of  health  care  providers  should 
be  differentiated  In  terms  of  data  access  Is  not  clear. 

Since  a  major  objective  of  databases  Is  the  sharing  of  data,  there  Is  little 
benefit  In  the  entry  end  collection  of  data  unsuitable  for  dissemination. 

We  advocate  leaving  very  sensitive  or  critical  material  out  of  the  database 
altogether.  When  data  Is  transferred  from  a  clinical  to  a  research 
environment  then  Identifying  characteristics  such  as  name,  birthday, 
Id-numbers,  etc.  are  best  deleted.  Sometimes  follow-up  requires  a 
reverse  linkage  capability.  Hashing  end  encrypting  techniques  exist  which 
permit  linkages  to  be  maintained  in  one  direction  only  [Wieder77j. 
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A  specific  problem  which  has  to  be  faced  In  the  effective  utilization 
of  health  care  databases  is  presented  by  missing  data.  Clinical  databases, 
due  to  lack  of  relevance  at  the  time  of  data  collection  In  relation  to 
cost  of  data  collection  and  entry,  will  always  have  Incomplete  records 
and  missing  data.  This  leads  to  a  number  of  Issues  which  have  been  often 
addressed,  but  not  yet  satisfactorily  treated  In  a  general  sense.  The 
statistical  routines  used  to  process  such  data  have  to  be  robust  and 
designed  to  explicitly  process  missing  data.  Many  algorithms  cannot  deal 
at  all  with  missing  data.  Then  preprocessing  Is  needed  to  remove  records 
containing  missing  data,  but  often  the  number  of  valid  rows  and  columns  Is 
drastically  reduced,  so  that  It  becomes  difficult  to  generate  significant 
results.  Techniques  to  replace  missing  values  with  appropriate 
Interpolations  may  have  less  effect  on  the  significance,  but  generate 
discomfort  In  the  mind  of  the  analyst.  A  prerequisite  in  either  case  Is 
that  the  encoding  scheme  used  recognizes  missing  data  during  data  entry 
and  that  the  storage  representation  never  confuses  valid  data  and  missing 
data. 

In  controlled  clinical  trials  a  large  amount  of  effort  Is  expended  to 
make  the  databases  complete  so  that  statistical  conclusions  can  be 
drawn  with  confidence  [Haensz66].  The  ensuing  cost  limits  data  acquisition 
so  that  the  approach  of  clinical  trials  is  mainly  effective  when 
well-defined  hypotheses  are  to  be  tested.  Often  the  number  of  candidate 
patients  Is  already  limited  due  to  the  prevalence  of  the  given  disease  and 
treatment  population.  Little  has  been  published  about  the  database 
methodologies  which  support  this  important  and  active  area,  although  the 
results  of  studies  performed  are  always  made  public.  A  new  journal  on 
controlled  clinical  trials  may  help  to  overcome  this  deficiency  [CCT79]. 

New  ways  of  dealing  with  missing  data  will  have  to  be  developed  and 
integrated  Into  clinical  database  systems  so  that  conflict  between 
collection  limitations  In  a  practice  and  research  demands  for 
completeness  of  data  can  be  mitigated.  One  approach  may  be  to  perform  the 
data  analysis  at  a  higher  level  of  abstraction  [Blum78]  so  that  multiple 
observation  types  can  be  combined  Into  a  single  medical  concept.  Then, 
if  selected  observations  are  missing,  the  general  medical  concept  can 
still  be  supported  from  related,  existing  observations. 


II. C. 7  Problems  of  Current  Interest 

To  make  databases  more  responsive  to  medicine  a  number  of  specific 
scientific  Issues  have  to  be  addressed.  When  reasonable  solutions  appear 
possible  the  results  will  have  to  be  embodied  In  experimental  systems 
and  presented  to  a  broad  audience  of  medical  users  to  test  the  validity 
of  the  approaches. 

W«  are  defining  databases  too  much  In  computer-oriented  terms  rather  than 
In  medically-oriented  terms.  Database  schemas  require  that  data  be 
Identified  as  numeric,  real  or  integer,  or  as  variables  that  hold  a  string 
of  characters.  Variables  describing  disease,  type,  stage,  and  location  are 
forced  to  be  explicitly  encoded  outside  of  the  database  system,  so  that 
natural  terms  are  not  used  for  input  and  absent  from  database  output. 

Even  well  understood  and  common  data  element  types,  such  as  the  date  of  an 
event,  are  handled  ineptly,  in  a  variety  of  formats  that  are  often  not 
obvious  end  hence  error  prone. 
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We  are  not  yet  dealing  well  with  data  that  represent  time  or  relationships 
of  events  that  are  time-oriented  [Bo1our79].  Manipulation  of  such  data  Is 
essential  If  we  are  to  build  valid  cause  and  effect  models.  The  problem 
of  Inference  from  recorded  events  over  time  can  be  approached  with  various 
methods,  but  the  methods  now  available  are  either  simplistic  or  quite  ad 
hoc.  Statistical  techniques  to  deal  with  data  observed  over  time  are  based 
on  models  that  are  overly  simple.  Time-lagged  correlation  assumes  equal 
time  Intervals,  but  In  general  we  do  not  observe  our  patients  at  fixed 
Intervals,  nor  can  we  assume  that  a  disease  progresses  at  a  fixed  rate, 
independent  of  the  Individual.  Heuristic  Inference  methods  can  capture 
time  dependencies,  but  are  often  quantitatively  weak. 

Adequate  higher  level  models  are  essential  to  the  effective  utilization  of 
the  contents  of  databases,  as  has  been  strongly  argued  in  [Schnei75].  The 
physicians  knowledge  of  the  relationships  of  signs  and  symptoms  to 
diagnoses  and  outcome  Is  as  Important  as  the  observations  themselves 
[Pople75].  The  acquisition  and  management  of  such  knowledge  is  a  central 
topic  of  research  In  applied  artificial  Intelligence  [Pople72].  Application 
of  knowledge  about  the  content  of  the  database  promises  to  help  greatly  in 
making  effective  use  of  future  databases  [Ku1iko77,  Schne178,  Slamac77, 
Bask1n78].  The  data  In  turn  can  be  used  to  add  quantification  to  the  rules 
used  to  represent  medical  knowledge. 

Many  of  the  problems  In  medical  databases  are  attacked  where  and  when 
they  become  bottlenecks,  usually  first  In  a  particular  application. 

Rarely  Is  literature  consulted  at  that  point,  a  programmer  Is  pressed  to 
design  and  code  a  solution.  The  fact  that  Medical  Information  Science  Is 
Increasingly  recognized  as  an  autonomous  field,  with  Its  own  journals 
[ JMS77],  will  help  to  disseminate  new  solutions  to  problems  to  the  workers 
In  the  area,  so  that  future,  better  systems  can  be  built  on  the  results  of 
current  research. 

Scientific  advances  can  Increase  the  utility  and  the  depth  of  database 
usage  In  all  health  care  application  fields,  although  Implementations 
of  databases  will  continue  to  differ  In  their  emphasis  In  the 
according  to  the  health  care  environment. 
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II. 0  Effect  of  Databases  on  Health  Care  Cost,  Quality,  and  Access 

The  sharing  of  Information,  made  possible  through  the  use  of  databases, 
is  expected  to  have  positive  effects  on  the  health  care  system.  The 
mechanisms  that  lead  to  such  Improvements  Include: 

a.  Readily  available  Information  will  reduce  the  need  for  duplication 

of  laboratory  tests. 

b.  Databank  analyses  that  advise  physicians  of  possible  drug-drug 

Interactions  for  a  patient  will  reduce  the  frequency  of  Iatrogenic 
Illnesses  [Hu1se76,  Morrel77,  Cardon78]. 

c.  Databank  analyses  that  advise  physicians  of  possible  drug-laboratory 

test  Interactions  for  a  patient  will  reduce  the  number  of  Invalid 
laboratory  tests  [Young72]. 

d.  Tracking  of  Individual  patients  who  are  at  risk  can  prevent  Inadequate 

follow-up,  reduce  morbidity,  and  the  associated  long-term  care 
costs  [Johns77]. 

e.  Computerized  problem  lists  In  the  medical  record  can  help  assure 

that  all  problems  of  each  patient  receive  attention,  rather  than  just 
the  most  obvious  ones. 

f.  The  availability  of  a  copy  or  an  abstract  of  the  medical  record  at 

all  of  the  candidate  encounter  sites  and  at  the  time  of  the  encounter 
can  prevent  misdiagnosis  and  over-prescript1on[Lyman76]. 

g.  Records,  perhaps  with  data  selected  and  formatted  for  the  particular 

site,  can  Improve  the  effectiveness  and  scope  of  community  and 
paraprofesslonal  personnel,  and  thus  support  a  multi-modal  health 
care  delivery  system  [Mesel76,  Z1elst77]. 

h.  Data  for  health  care  research  at  various  levels  can  be  extracted 

out  of  clinical  databases,  so  that  costs  of  otherwise  redundant 
data  collection  can  be  avoided. 

These  factors  can  all  lower  health  care  access  costs;  others  affect  the 
quality  of  health  care  [Barn&W78].  The  use  of  paraprofesslonals  should 
Improve  access  to  health  services. 

It  has  been  difficult  to  demonstrate  the  benefits  of  these  systems  to  a  high 
level  of  statistical  significance.  The  measures  collected  by  the  study  on 
the  Implementation  of  a  hospital  information  system  at  El  Camino  Hospital 
have  that  problem,  even  though  the  trends  favoring  the  automated  system 
appear  clearly  [NCHSR77].  The  effects  to  be  demonstrated  occur  over  a  large 
patient  population  and  a  long  span  of  time.  The  before  and  after  measures 
of  outcome  show  only  small  differences,  but  due  to  the  high  cost  of  health 
care  makes  even  small  differences  are  impressive  In  absolute  terms.  Since 
health  care  does  not  stand  still  for  measurement,  many  confounding  events 
take  place  during  a  study.  It  Is  nearly  Impossible  to  find  two  similar 
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health  care  Institutions  which  can  serve  the  roles  of  test  and  control  sites 
for  an  experiment..  But  Informal  assessment  of  operational  database  systems 
takes  place  continously.  It  is  actually  doubtful  that  any  inappropriate 
system  will  remain  in  operation  very  long.  Eventual  acceptance  of  a  medical 
database  systems  can  provide  another,  quite  stringent,  evaluation  of 
effectiveness. 

Databases  do  provide  the  information  needed  to  measure  cost,  quality,  and 
access,  and  can  be  viewed  from  this  point  alone  to  be  a  tool  in  the 
improvement  of  health  care  [Donabe78].  In  our  study  we  found  that, 
contrary  to  our  hypothesis,  that  the  presence  of  federal  funding  did  not 
make  the  eventual  system  less  vital  than  systems  that  were  privately 
developed  [Henley75].  Overall  the  failure  rate  was  quite  high,  much 
higher  than  the  success  rate;  but  failures  can  also  be  ascribed  to  many 
other  factors  than  the  database  technology.  It  is  obviously  important  to 
have  a  system  that  is  well  matched  to  the  setting.  The  computers  and  their 
software  typically  make  up  less  than  half  of  the  system  cost  during 
operation,  the  time  and  effort  spend  by  medical  personnel  with  the  system 
is  very  valuable,  so  that  medical  relevance  weighs  more  than  direct  system 
cost  considerations. 

The  effect  that  research  results,  obtained  through  use  of  medical 
databases  has  on  health  care  delivery  is  even  more  difficult  to  measure. 
Since  most  large  clinical  studies  rely  on  databases  the  assessment  has 
to  shift  to  the  efficacy  of  such  research.  There  is  little  doubt,  that  if 
reasonable  models  of  cause  and  effect  are  used,  that  such  research 
increases  our  understanding  of  many  disease  processes.  Such  medical 
evaluations,  will  improve  health  care  quality  and  reduce  cost.  If 
understanding  of  disease  processes  also  leads  to  the  provision  of 
appropriate  entry  points  Into  the  system,  by  say  systematic  screening 
of  populations  likely  to  be  affected,  then  health  care  access  has  also 
improved. 
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III. 


STATE  OF  THE  ART  OF  DATABASE  TECHNOLOGY  IN  HEALTH  CARE 


There  Is  of  course  a  strong  Interplay  between  database  development  In  general 
and  Its  application  In  health  care.  Most  basic  research  takes  place  outside 
of  the  health  care  field.  But  applications  of  databases  are  neccessarlly 
application  area  related,  and  current  work  spans  the  entire  spectrum  of 
database  use,  since  the  Ideal  database  does  not  yet  exist  anywhere. 

III. A  Systems  In  Research  or  Development  Status 

Many  new  data  base  systems  and  experiments  are  motivated  by  specific 
problems  that  are  common  In  health  care  and  In  the  systems  that  attempt  to 
serve  the  area.  In  this  section  we  will  touch  upon  two  such  problem  areas 
and  Indicate  methods  that  have  been  tested  or  are  under  test  In  current 
systems.  No  single  system  can  attempt  to  advance  the  state  of  the  art  In  all 
problem  areas  at  the  same  time.  As  specific  problems  reach  reasonable  levels 
of  resolution  the  most  successful  methods  should  be  Integrated  Into  systems 
which  have  concentrated  on  resolving  problems  In  other  areas.  This 
Integration  of  solutions  has  not  always  happened,  partially  because  of 
difficulties  In  the  tranfer  of  new  and  incompatible  technology,  sometimes 
because  of  lack  of  awareness  of  approaches  that  were  demonstrated  in  similar, 
but  not  directly  related  areas  of  application.  Some  types  of  problems  may 
affect  one  category  of  database  more  than  another,  since  system  requirements 
depend  on  the  Institutional  getting  or  on  the  type  of  user  served  by  the 
database.  In  the  Appendix  we  will  describe  In  detail  several  systems  which 
are  at  the  forefront  of  their  area  of  application  and  which  embody  many  of 
the  aspects  needed  for  dissemination  of  database  technology.  We  have  already 
considered  the  critical  problems  of  data  entry  and  management  of  missing 
data  because  of  their  particular  importance  to  medical  databases. 

The  support  for  controlled  clinical  trials,  presented  in  section  II. B. 4 
above,  is  now  being  extended  with  the  use  of  distributed  small  computers 
(Data  General  Micro-Novas),  which  are  to  be  placed  at  the  health  care 
delivery  sites.  Data  entry  and  protocol  advice  can  be  supported  locally; 
at  night  the  systems  communicate  with  the  central  database  (GMDB),  where 
the  long  term  data  for  the  southwest  cooperative  Study  Group  are  being 
maintained. 

The  work  on  the  PROMIS  system  has  recently  concentrated  on  the  data  entry  and 
transmission  problems  [Schult75].  Special  terminals  with  touch-sensitive 
screens  are  used,  and  data  is  transmitted  in  packets  over  a  shared  coaxial 
cable,  a  method  which  provides  a  very  high  performance  at  a  reasonable  cost 
[Wanner 78].  Languages  to  simplify  the  definition  and  use  of  the  display 
frames  on  the  terminals  are  part  of  several  system  efforts.  Other  techniques 
to  deal  with  data  entry  are  not  associated  with  particular  system 
developments.  Avoidance  of  redundant  entry,  perhaps  combined  with  collection 
close  to  the  source  of  data,  using  mini-  or  micro-computers  Interacting  with 
medical  personnel,  appears  to  be  a  fruitful  direction.  Voice  data  entry  will 
become  available  as  a  means  to  collect  data  using  limited  vocabularies, 
similar  In  style  to  menu  selection  schemes. 

Artificial  Intelligence  techniques  to  deal  with  missing  data  are  being 
explored  In  the  RX  project  [Blum78],  Here  the  detailed  data  observations 
from  the  ARAMIS  database  are  being  Integrated  into  higher  level  concept, 
which  may  then  be  used  to  define  a  patients  progress  at  a  more  clinically 
relevant  level.  Proper  management  of  time-oriented  causality  Is  an  Important 
aspect  of  this  work. 
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III-.B  Industrial  Status 

There  are  two  directions  In  current  commercial  efforts.  First  there  are 
companies  which  address  specically  the  medical  market,  and  then  there  Is 
general  database  system  development.  Some  of  these  general  systems  may 
serve  certain  medical  application  areas  quite  well. 

III.B.l  Medical  Database  Systems 

We  have  defined  throughout  that  databases  consist  of  data,  and  systems 
to  manipulate  the  data.  While  hardware  and  software  vendors  will  make 
file  and  database  systems  available,  the  data  collection  and  Information 
applications  remain  In  medical  hands.  Since  It  Is  difficult  to  develop 
medically  relevant  systems  without  having  the  required  medical  expertise 
we  find  that  commercial  firms  often  develop  their  prototypes  within  a 
specific  medical  environment.  The  chosen  healthcare  Institution  may 
benefit  by  obtaining  a  system  which  satisfies  Its  particular  needs, 
without  the  expense  of  paying  for  a  handcrafted  system,  and  the  vendor 
benefits  from  the  access  to  healthcare  expertise,  which  would  be  nearly 
Impossible  to  duplicate  In  the  vendor's  domain.  But  there  are  also  major 
costs  for  the  Institution  In  such  a  cooperative  venture.  The  system 
development,  since  It  needs  to  have  general  applicability,  will  take  more 
time  than  would  be  required  for  an  Institutions  specific  system  and  that 
the  vendor  will  find  some  aspects  of  the  institution  are  so  particular 
that  they  will  not  served  well  by  the  system.  It  is  also  easy  to 
underestimate  the  cost  In  physician,  nurses,  and  management  time  involved 
In  participating  in  a  development  project.  Excessive  expectations  may  lead 
to  disappointments  by  either  party,  the  greatest  disappointment  occurs  If 
the  vendor  falls,  either  In  terms  of  producing  an  acceptable  system  or 
totally  as  a  business. 

Most  systems  now  on  the  market  had  their  beginnings  In  cooperative  efforts. 
The  Technlcon  Medical  Information  System  for  hospitals  was  developed  at  El 
Camlno  Hospital  In  Sunnyvale,  California,  and  even  though  there  were 
traumatic  moments,  the  hospital  now  obtains  services  at  a  favorable  price, 
and  Technicon  markets  the  system  to  other  hospitals.  Another  example  of  a 
hospital  system  Is  given  In  the  Appendix  In  section  IV. E.  Here  a  tripartite 
partnership  was  In  operation:  a  software  company,  Dynamic  Controls 
developed  the  programs,  the  hospital  obtained  the  IBM  hardware,  and  the 
result  Is  marketed  with  IBM  assistance. 

The  MUMPS  and  COSTAR  systems  were  developed  at  Massachusetts  General  Hopltnl, 
by  Its  Laboratory  of  Computer  Science,  with  major  support  from  the  National  » 
Center  of  Health  Services  Research.  The  programs  are  hence  in  the  public 
domain,  and  a  number  of  companies  have  taken  them,  improved  them,  and  started 
to  provide  services  based  on  them.  A  problem  due  to  the  multiplicity  of 
MUMPS  vendors  has  been  that  a  number  of  language  dialects  have  developed. 

In  1978  a  standard  definition  of  MUMPS  was  accepted  by  the  American  Standards 
Institute,  and  It  appears  that  all  new  work  will  be  based  on  this  standard. 

In  Germany  a  project  of  some  health  Insurance  agencies  is  developing  a  system, 
based  on  PASCAL-or iented  microcomputers,  for  distribution  to  their  client 
physicians.  The  Intent  Is  to  combine  the  keeping  of  simple  medical  records 
and  claims  processing  In  the  office,  and  transmit  the  Information  to  central 
computers  via  malable  disks  or  by  direct  linkage. 
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III. B. 2  General  Database  Systems  That  Are  Applicable  to  Health  Care 

Whereas  In  the  past  few  commercial  database  systems  have  had  the  required 
flexibility,  reasonable  scale,  and  pleasant  human  Interface  to  be  useful 
In  a  medical  setting,  we  see  today  that  this  picture  Is  changing.  Simple 
database  systems  are  becoming  available  on  many  midi-  and  mini-computers. 
Terminal  Interaction,  using  formatted  displays,  Is  becoming  better 
understood.  Requirements  for  reliable  long-term  storage  are  aided  by 
Improved  software  technology  and  Improved  hardware.  Hierarchical 
commercial  systems,  such  as  the  MRI/INTEL  System  2888,  are  being  used  to 
store  patient-oriented  records  In  clinical  settings.  CODASYL  based 
network  databases  are  seen  In  large  Institutional  settings  [Pen1ck76], 
and  a  FACOM  system  serves  an  eye  clinic  In  Osaka  [Watana78].  The  Laboratory 
Animal  Data  Bank  sponsored  by  NLM  uses  the  Batelle  BASIS  Information 
Retrieval  system,  as  well  as  TYMNET  for  access,  and  this  combination  has 
provided  comprehensive  and  reliable  services  at  relatively  low  cost. 

Data  General's  DBMS  system,  INFOS,  Is  used  with  specialized  hardware  and 
software  to  manage  documents  which  track  the  recovery  process  of  workmen’s 
compensation  cases  [Bucho178],  and  to  assure  follow-up  of  care  services. 

The  National  Spinal  Injury  Research  Data  Center  at  the  Good  Samaritan 
Hospital  In  Phoenix,  Arizona,  collects  regional  information  on  spinal  Injury, 
Its  treatment,  and  the  costs  of  treatmeent  and  rehabilitation.  It  uses  a  a 
simple  network  system,  IMAGE3888,  available  for  the  Hewlett-Packard  HP-3888 
computers  [Jewel79],  These  two  systems  are  distinguished  in  that  they  do 
not  only  consider  the  medical  costs,  but  also  the  cost  of  having  a  person 
disabled  during  the  recovery  period. 

There  is  also  a  flow  from  medicine  to  the  commercial  world.  The  MUMPS 
system,  developed  for  and  In  a  medical  environment  is  now  being  marketed 
by  DEC  for  general  minicomputer  data  management  as  DSM-11  [Bowie76],  A 
standardization  effort  has  focused  attention  on  the  language  and  system 
[Wasser76],  and  recent  implementations  make  use  of  modern  data  structures, 
specifically  B-trees  which  provide  the  hierarchical  access  without 
excessive  dependence  on  specific  hardware  parameters  [Wieder77]. 

The  demands  made  in  medicine  rer  statistical  analysis  era  a  model  rar 
interactive  statistics  In  other  planning  environments.  Many  statistics 
packages  have  had  their  origin  in  medical  research,  and  have  now  entered 
a  more  general  market.  A  particular  example  is  RS/1,  a  research  support 
database  system  sold  by  BBN,  shows  Its  heritage  to  PROPHET,  the 
pharmacological  research  support  system  developed  by  BBN  under  NIH 
sponsorship,  and  employs  for  the  data  analysis  tasks  the  BMDP  statistical 
package,  which  was  developed  at  the  Health  Sciences  Computing  Facility  at 
UCLA,  also  under  NIH  sponsorship. 


III.C  Current  Directions  of  Development 


Improving  the  Interaction  and  high  level  usability  of  medical  systems 
Is  a  major  ongoing  effort.  The  addition  of  decision  criteria  [Warner78], 
knowledge  about  the  data  [Levy78],  and  heuristic  rules  to  relate  medical 
events  [Short179]  are  basic  to  this  line  of  development.  Permitting 
queries  formulated  In  natural  language  can  remove  barriers  to  accessabllity 
[Epste178].  Improved  coupling  of  these  techniques  to  databases  can  make 
the  collected  experience  more  valuable.  Alternatives  to  knowledge 
extraction  from  databases  can  change  the  manner  In  which  research  results 
reach  the  physician  [B1um78].  A  review  of  computer  projects  supported  by 
the  National  Center  for  Health  Services  Research,  many  of  which  address 
Issues  discussed  here,  csn  be  found  In  [NCHSR79]. 

While  the  use  of  a  DBMS  Is  traditionally  associated  with  fairly  large 
computers,  a  number  of  DBMS’s  are  now  becoming  available  for  small  machines. 
Large  machines  have  not  always  been  able  to  provide  the  reliability, 
priority  of  access,  and  low  cost  desired  In  the  medical  area  so  that  DBMS 
technology  was  not  easily  available  In  health  care.  The  majority  of  medical 
databases  In  use  today  do  not  use  a  DBMS,  but  It  is  likely  that  the  usage  of 
DBMS’s  will  Increase.  The  continued  reduction  of  computer  hardware  prices 
favors  this  tendency.  The  cost  of  writing  software  is  not  changing  as  fast, 
so  that  inefficiency  of  hardware  utilization,  caused  by  the  use  of  a  more 
powerful,  but  standard  product  Is  probably  less  costly  than  the  effort  of 
Installing  and  maintaining  a  specialized  and  optimal  system.  There  will 
remain  situations  where  specially  tailored  database  support  Is  needed  to 
bring  performance  within  the  constraints  of  critical  time  limits.  For  high 
volume  operation  some  operational  costs  reductions  might  be  gained  from 
such  tailoring.  Problem  specific  adaptations  are  often  more  easily  applied 
to  a  file  system  than  to  a  database  system. 

We  have  already  discussed  Issues  associated  with  access  to  databases  on  a 
distributed  basis.  The  availability  of  public  networks  will  accelerate 
this  development  and  eventually  permit  accessing  multiple  related  databases 
within  one  analysis  task.  Communities  of  medical  scientist  will  be  bound 
together  by  shared  Interests,  exemplified  In  the  databases,  rather  than  by 
the  boundaries  of  the  Institutions  which  employ  them.  Centers  of  excellence 
will  maintain  specialized  databases,  while  their  associates  can  be  remote 
and  Interact  with  collegues,  patients,  and  students  In  their  Institutions. 

Databases  will  then  be  the  repositories  for  expertise  and  knowledge  will 
be  tested  for  validity  against  the  collected  data,  and  quantitative 
parameters  will  be  based  on  database  analyses. 
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IV. A  Public  Health:  THE  CENTRALIZED  CANCER  PATIENT  DATA  SYSTEM 

[This  material  Is  based  on  the  Introductory  description  by  Polly  Felgl  end 
other  personnel  of  the  SAQC,  Fred  Hutchinson  Cancer  Center.  Seattle,  WA.] 

INTRODUCTION 

The  Centralized  Cancer  Patient  Data  System  (CCPDS)  Is  a  standard 
system  for  registering  persons  with  reportable  malignant  neoplasms,  who 
are  patients  of  comprehensive  cancer  centers  In  the  United  States. 

Eligible  patients  were  first  admitted  to  a  center  on  or  after  July  1,  1977, 
and  are  reported  to  the  Statistical  Analysis  and  Quality  Control  (SAQC) 
Center  In  Seattle,  Washington. 

OVERVIEW  OF  SAQC 

The  SAQC  Center,  located  at  the  Fred  Hutchinson  Cancer  Research  Center, 
consists  of  three  units  to  carry  on  the  technical  activities:  a 
Field  Liaison  Unit,  a  Data  Processing  Unit,  an  Epidemiology  Unit,  plus 
an  Administrative  section.  Most  data  Is  received  at  SAQC  on  tape  and 
Is  Immediately  subjected  to  computerized  analysis.  Resultant  reports 
Inform  the  submitting  center  of  cases  accepted  Into  and  rejected  from 
the  database.  Rejected  data  will  be  corrected  at  the  source  for 
resubmission.  Communication  regarding  technical  matters  Is  generally 
carried  out  by  field  representatives  assigned  to  centers.  Centers’ 
data  coordinators  collectively  provide  advice  to  SAQC  via  the 
Technical  Advisory  Committee.  Within  this  group,  special  subcommltteess 
are  Involved  with  Quality  Control  and  Training,  Data  Utilization,  and 
Research  Planning.  A  Policy  Advisory  Committee,  also  composed  of 
center  representatives,  affords  further  advice  on  a  policy  level. 

Thirty-eight  Items  of  Information  are  collected  on  each  patient. 

Including  demographic  characteristics,  diagnosis,  therapy,  and  survival. 
Standardized  definitions  of  data  Items  have  been  documented  In  the 
"CCPDS  Data  Acquisition  Manual"  (DAM).  This  manual  also  Includes 
recommended  procedures  for  abstracting,  coding,  submitting  data  to  SAQC, 
and  quality  control. 

OBJECTIVES  OF  CCPDS 

The  database  being  collected  has  a  number  of  purposes.  Selection  of 
cases  over  a  broad  population  will  allow  clinical  researchers  to  locate 
similar  patients  for  detailed  comparative  analyses.  Treatment  conventions 
may  differ  among  the  cancer  centers,  and  such  differences  and  their  effects 
may  be  Identified  through  the  CCPDS  database.  Knowledge  about  the 
prevalence  of  various  types  of  cancers  and  the  changing  effectiveness  of 
treatments  can  warrant  Increased  efforts  In  specific  areas. 

Initially,  standard  definitions  and  codes  had  to  be  established  for 
reportable  patients  and  tumors,  as  well  as  for  each  of  the  thirty-eight 
data  Items.  Then,  criteria  for  quality  control  were  set  up  to  assess 
accuracy,  completeness  and  timeliness  of  reporting.  There  Is  a 
continuing  effort  to  maintain  Intercenter  comparability  and  compatibility 
with  other  national  and  International  cancer  reporting  systems.  CCPDS 
data  Is  disseminated  according  to  policies  and  procedures  developed  by 
a  Policy  Advisory  Committee  for  that  purpose. 
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On  a  long  term  basis,  modular  expansion  Is  seen  as  the  way  for  CCPDS  to 
function  for  collaborative  analytic  studies.  Special  areas  of  Interest 
Identified  Include  staging  for  certain  sites,  therapies,  etiologies,  health 
services,  and  rare  neoplasms.  The  central  registry  can  serve  as  a  means  of 
Identifying  patients  for  studies  or  for  identifying  Institutions  with  the 
potential  to  contribute  such  patients  to  such  studies. 

PATIENT  DEFINITION 

A  patient  who  Is  reportable  to  CCPDS  Is  any  individual  with  a  frankly 
malignant  tumor  who  Is  seen  as  an  In-  or  out-patient  at  the  center  and  is 
assigned  a  hospital  or  clinic  number.  Included  are  patients  whose  diseases 
have  been  clinically  diagnosed  but  not  microscopically  confirmed;  patients 
not  diagnosed  at  the  center  but  referred  for  therapy  of  of  recurrent  or 
late  metastatic  disease,  and  patients  who  are  clinically  free  of  disease, 
but  are  admitted  to  the  center  for  adjuvant  or  prophylactic  anti-cancer 
therapy,  If  that  admission  occurs  within  two  months  of  the  Initial 
treatment. 

Patients  who  are  excluded  due  to  this  definition  are  consult-only  cases, 
cases  diagnosed  at  autopsy  and  former  cancer  patients  with  no  evidence  of 
residual  disease  who  are  admitted  for  rehabilitation  or  for  treatment  of 
some  other  condition.  Also  excluded  are  cases  of  basal  and  squamous  cell 
carcinoma  of  the  skin. 

CCPDS  DATABASE  AND  COMPREHENSIVE  CENTERS 

The  data  processing  system  at  SAQC  has  been  set  up  to  register  about  47,890 
new  patients  cases  each  year.  Follow-up  data  Is  added  annually  on  all 
registered  patients.  Patient  data  Is  entered  Into  the  database  only  after 
successfully  passing  all  of  the  SAQC  edit  checks. 

As  of  March  1979,  twenty-two  cancer  centers  have  been  designate  as 
comprehensive  centers,  based  on  criteria  which  Include  having  an  adequate 
statistical  base.  Some  centers  are  actually  consortiums  of  hospitals 
which  collectively  submit  data  to  SAQC.,  The  centers  are:  University  of 
Alabama,  Colorado,  Duke  University,  Florida,  Fox  Chase  &  University  of 
Pennsylvania,  Fred  Hutchinson,  Georgetown  &  Howard  Universities,  Illinois 
Cancer  Council,  Johns  Hopkins,  Los  Angeles  County  &  USC,  Mayo  Clinic, 

M.D.  Anderson,  Ohio  State,  Roswell  Park,  Sidney  Farber,  Sloan-Ketter Ing, 
UCLA,  Wisconsin,  Yale  University,  Detroit,  and  Columbia  University. 

QUALITY  CONTROL 

A  major  on-going  effort  at  SAQC  Is  directed  toward  assessing  the  quality 
of  CCPOS  data.  A  Data  Monitoring  Plan  has  been  written  for  monitoring 
accuracy,  timeliness  and  completeness  of  data,  as  well  as  compliance  with 
established  SAQC  data  acquisition  rules  and  procedures.  Coding  practices 
at  contributing  centers  are  measured  In  several  ways,  one  of  which  Is  by 
applying  computerized  edit  checks  to  data  submitted  to  SAQC.  Also,  SAQC 
field  representatives  visit  each  center  annually  to  Independently 
reabstract  and  recode  a  random  sample  of  previously  reported  cases.  A 
coding  reliability  study  was  conducted  during  1978  for  which  a  standard 
set  of  test  cases  was  sent  to  each  center  for  abstracting  and  coding. 

Item  error  rates  up  to  16X  were  found,  and  up  to  3ZX  In  staging  codes. 

The  various  procedures  for  assessing  quality  of  data  allow  for  comparing 
coded  data  both  between  centers  as  well  as  between  centers  and  SAQC.  These 
early  efforts  toward  enhancing  data  quality  will  pay  off  when  the  data  Is 
utilized  for  research  studies. 


LIST  OF  CCPOS  DATA  ITEMS 
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INITIAL  REGISTRATION 
Item  No.  date  type 

Identification 

1  Institution  Code 

2  Patient  Identification  Number  and  Check  Digit 

3  File  Number  4  Blrthdate 

Demographic  Information 

5  Birthplace  6  Race/Ethnicity 

7  Sex  8  Residence  at  Time  of  Admission 

Diagnosis 

9  Date  of  First  Admission  to  Center  for  This  Tumor 
10  Sequence  11  Date  of  Initial  Diagnosis 

12  Primary  Site  13  Laterality 

14  Histology  15  Diagnostic  Confirmation 

16  Date  of  Best  Diagnostic  Confirmation 

17  Stage  of  Disease  at  Time  of  First  Therapy  at  Center 

Therapy  Cancer  Therapy  Prior  to  Admission  to  Center 


18 

Surgery 

19 

Radiation  Therapy 

20 

Chemotherapy 

21 

Endocrine  Therapy 

22 

Immunotherapy 

23 

Other  Cancer  Therapy 

24 

Date  of  Initial 

Therapy  at  Center 

Initial  Therapy  After  Admission  to  Center 


25 

Surgery 

26 

Radiation  Therapy 

27 

Chemotherapy 

28 

Endocrine  Therapy 

29 

Immunotheraphy 

30 

Other  Cancer  Therapy 

Patient  Status 

32  Date  of  Last  Contact/Death 

33  Autopsy 

34  Cancer/Treatment  Related  to  Death 
FOLLOW-UP  ITEMS 

Identification  as  before 

Follow-Up  Information 

35  Date  Report  Prepared  36  Method  of  Follow-Up 

13  Laterality 

32  Date  Last  Contact/Death  33  Autopsy 

34  Cancer/Treatment  Related  to  Death 

PROTOCOL  ITEMS 

Identification  as  before 

Protocol  Information 

37  Date  Entered  on  NCI  Protocol 

38  NCI  Protocol  Identification 

37  Corrected  Date  Entered  on  NCI  Protocol 

38  Corrected  NCI  Protocol  Identification 
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IV. B  Randomized  Controlled  Clinical  Trials:  HARVARD  PUBLIC  HEALTH 

[This  Information  was  provided  by  Dr.  William  6.  Costello  of  the  Harvard 
School  of  Public  Health.] 

The  Division  of  Biostatistics  and  Epidemiology  in  the  Sidney  Farber  Cancer 
Institute  of  the  Harvard  School  of  Public  Health  provides  data  processing 
and  analysis  services  to  groups  engaged  In  clinical  trials.  The  primary 
groups  being  served  are  consolidated  as  the  Eastern  Cooperative  Oncology 
Group  (ECOG)  and  the  Radiation  Therapy  Oncology  Group  (RTOG). 

The  dat*  processing  system  for  the  ECOG  and  RTOG  consists  of  software, 
hardware,  and  procedures  which  have  been  developed  aver  the  past  3  years 
and  now  operate  smoothly  for  all  studies  of  these  groups.  It  encompasses 
every  aspect  of  the  processing,  from  data  collection  to  data  updating  and 
error  checking,  and  retrieval  and  analysis  of  data.  A  modular  design  was 
adopted  which  allows  for  flexibility  of  operation  and  routine  Incorporation 
of  new  Group  studies  or  data  formats.  Although  the  software  Is  largely 
general-purpose,  the  system  Is  specifically  oriented  toward  clinical 
cooperative  group  activities. 

Overall  System  Design  -  Input 

The  overall  design  of  the  data  processing  for  ECOG  and  RTOG  may  be  seen 
by  tracing  the  flow  of  data  through  the  system.  Data  forms  are 
received  from  member  Institutions  via  the  Group's  Operations  Office, 
and  are  logged  and  checked  for  completeness.  Data  Managers  conduct 
extensive  manual  checking  of  the  data  to  ensure  high  quality.  This 
Includes  verifying  that: 

1  The  patient  was  eligible  for  the  particular  study. 

2  The  treatment  was  given  according  to  protocol. 

3  The  toxlcltles  were  reported  correctly. 

4  There  was  adequate  documentation  for  tumor  response  evaluation. 

5  The  required  data  Items  have  been  answered. 

Any  queries  about  the  data  or  requests  for  more  information  are  sent  back 
to  the  contributing  Institution.  During  1977,  more  than  1100  query 
letters  were  sent  to  ECOG  Investigators.  When  complete  case  records  are 
available,  the  Data  Manager  prepares  a  Case  Evaluation  Form.  This 
provides  a  vehicle  for  noting  ellglbltlty  or  protocol  compliance  problems 
or  any  problems  in  evaluating  what  happened  to  the  patient.  This  Is  sent 
to  the  Study  Chairman  to  aid  In  reviewing  the  records.  Ultimately,  the 
contributing  Institution  receives  a  copy  of  this  evaluation,  so  that  a 
formal  feedback  mechanism  Is  available  to  them. 

Most  forms  used  In  these  groups  have  codes  printed  with  the  boxes  to  be 
checked  so  that  they  are  completely  ready  for  data  entry  personnel  once 
checked  by  the  Managers.  All  self  coding  forms  and  data  entry  documents 
are  sent  to  data  entry  where  they  are  keyed  and  verified. 

Data  Items  are  checked  for  syntax  and  range  as  they  are  keyed.  The 
records  are  then  transmitted  to  the  DEC-20  computer  operated  by  the 
division  and  distributed  to  the  responsible  Data  Managers’  directory  along 
with  electronic  mall  notifying  the  Data  Manager  that  he  or  she  may  now 
update  the  study  files.  The  update  Is  Initiated  by  submitting  a  control 
file.  Updates  are  run  In  batch  mode  at  night. 
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During  the  update  run,  new  data  Items  may  be  calculated,  for  Instance 
survival  time.  Both  these  calculated  Items  and  the  primary  Items  are 
checked  with  automated  editing  procedures  which  Include: 

1.  Format  of  Input  data 

2.  Presence  of  "must  fill"  fields 

3.  Proper  values  -  range  checks  and  checks  for  special  "allow*  values 

4.  Logical  checks  *  comparison  of  data  Items  In  the  patient 
record  for  consistency  and  plausibility. 

Any  exceptions  In  the  data  which  are  detected  by  these  automatic 
procedures  are  reported  to  the  Data  Managers.  They  are  responsible 
for  correcting  or  resolving  the  discrepancies;  query  letters  to 
Institutions  may  be  generated  at  this  stage,  as  well.  Once  past  the 
editing  stage,  the  input  data  are  merged  into  the  master  study  data 
files. 

In  the  event  of  hardware  or  software  failures,  backup  copies  of  the  data 
files  are  maintained,  so  that  recovery  of  data  files  Is  possible. 

Multiple  backups  are  kept  In  physically  separate  and  remote  locations. 

Supporting  the  automatic  editing  and  updating  functions  Is  a  system 
of  data  descrtptlon  files,  called  the  Data  Dictinary.  These  files  allow 
the  editing  and  updating  programs  to  proceed  automatically,  because  they 
contain  the  formats,  field  widths,  allowable  values  and  cross-checking 
procedures  whtch  are  used  for  these  functions.  Central  maintenance  of 
the  Data  Dictionary  Is  the  responsibility  of  the  Data  Base  Administrator. 
The  Data  Base  Admlnstrator  also  coordinates  all  data  file  maintenance  and 
Initialization  operations. 

Overall  System  Design  -  Output 

All  outputs  from  the  study  data  files  are  based  on  two  principal 
features:  The  QUIRE  retrieval  system  and  the  Data  Dictionary  facilities. 
These  capabilities  allow  retrieval  of  any  data,  even  by  non -programmers, 
with  minimal  training  and  work.  The  system  has  data  Independence 
features  which  make  It  possible  for  users  to  be  Insulated  from  any 
changes  In  data  formats  or  additions  of  new  data  types.  This  simplifies 
the  use  of  the  data  and  provides  for  dynamic  growth  In  the  structure  of 
the  data  base. 

The  QUIRE  retrieval  system  Is  a  collection  of  programs  which  allow 
the  user  to  retrieve  any  data  In  the  study  data  files,  without 
requiring  detailed  knowledge  about  data  formats  or  any  computer 
programming  on  the  part  of  the  user.  In  order  to  specify  the  desired 
output,  the  user  may  merely  indicate  which  study  or  studies  he  Is 
Interested  In,  and  give  a  list  of  Identifiers  of  the  data  Items  he 
wishes  to  retrieve.  The  QUIRE  software  refers  to  the  Data  Dictionary 
to  find  out  how  the  data  Is  stored  and  other  necessary  details.  The 
system  automatically  generates  the  output  data  file.  In  addition,  the 
system  generates  a  description  file  which  contains  the  necessary 
formats  and  labels  to  describe  the  retrieved  datafile.  This  Is  an 
Important  aspect,  because  It  allows  building  of  report  generation  and 
analysis  programs  which  can  automatical!  access  that  data  file  without 
further  programming  effort  on  the  part  of  the  user. 
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Th*  output  subsystem  Includes  software  for  data  transformation  (creating 
new  data  Items,  re-scaling  or  re-grouping  data  values,  etc),  listing  of 
data  Items  by  patient,  and  generation  of  cross-tabulations.  Analysis 
programs  Include  the  SPSS  statistical  package,  the  I MSI  subroutine 
library,  and  many  programs  developed  In  the  Statistical  Laboratory  which 
are  Important  to  the  analysis  of  clinical  trials  data.  These  Include 
sophisticated  procedures  for  evaluating  treatment  and  other  effects  based 
on  multidimensional  contingency  tables  and  censored  survival  data. 

Programs  may  produce  computer  listing  output,  special  data  files  for  later 
analysis,  or  graphical  output.  The  latter  Is  particularly  useful  both  for 
analysis  of  complex  data  relationships  and  for  representation  of  results 
In  compact  form  to  clinical  Investigators  In  th*  Cooperative  Groups. 

Th*  study  data  flies  are  organized  as  separate  files  for  each  study.  Each 
file  has  a  sequential  data  layout.  This  allows  good  performance  for  the 
analysis  of  data  from  a  given  study  by  statisticians. 

The  Data  Dictionary  Is  composed  of  several  fields  describing  various 
aspects  of  the  data.  Generally,  Its  organization  reflects  the  Input  data 
to  the  system.  It  Is  based  on  segments  of  data,  each  segment  consisting 
of  a  number  of  related  data  Items.  For  example,  a  self-coding  form  may  be 
described  as  a  segment.  A  file  exists  for  each  segment,  and  records 
contain: 

1.  A  short  Identifier, 

2.  A  short  alpha-numeric  descriptor, 

3.  Details  about  where  and  how  the  information  Is  stored,  and 

4.  Data  range  checking  Information  to  be  used  for  automatic  editing 
when  the  Item  is  updated. 

There  are  general  segment  description  records  In  another  file.  Finally, 
different  views  of  the  data  are  supported  through  a  system  of  directory 
files,  which  catalogue  the  description  files  needed  for  a  given  application. 
The  user  need  only  Identify  the  application  (for  example,  all  data  for  a 
given  study)  and  the  QUIRE  system  takes  care  of  finding  the  appropriate 
descriptions  through  the  directory  files. 

Report  Generating  Capability 

For  Instance,  an  excerpt  from  the  System  Data  Dictionary  (SDD)  file  for 
ECOG  data  could  be  a  directory  file  describing  the  view  of  the  data 
corresponding  to  the  protocol  ECOG  4274.  This  would  contain  the  names 
of  all  segments  comprising  the  description  of  ECOG  4274.  Each  segment 
description  contains  variable  description  Information  for  each  data 
element  and  Includes: 

1.  Data  element  key 

2.  Data  element  FORTRAN  format  type  and  length  (for  Instance:  15) 

3.  Starting  position  In  data  line  (16) 

4.  Logical  segment  type  corresponding  to  a  form  (A) 

5.  Data  line  type  (C) 

6.  Some  descriptive  information  for  each  data  element 

7.  Minimal  legal  value  for  each  variable 

8.  Maximum  legal  value  for  each  variable 

9.  Four  "allow"  values  for  each  variable  even  If  they 
do  not  fall  Into  the  range  given  by  7  and  8. 

In  updating  extensive  cross  variable  checking  procedures  are  also  used. 
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IV. C  Clinical  Research:  TOO.  THE  T IME-ORIENTED  DATABASE  SYSTEM 

[This  Material  Is  based  on  an  overview  paper  by  Drs.  Dennis  McShane  and 
James  Fries  of  the  Stanford  University  Medical  School] 

The  Time-Oriented  Database  System  supports  data  banks  primarily  dealing 
with  chronic  diseases.  A  major  user  is  the  ARAMIS  project  with  six 
operating  databanks  in  rheumatic  disease  from  six  Institutions.  Other 
current  users  Include  the  Northern  California  Cancer  Center  and  the 
National  Stroke  Data  Base. 

The  Individual  data  banks  utilize  the  same  file  structure,  format,  and 
common  entry  and  retrieval  programs.  Programs  and  software  are 
schema-driven  end  content- Independent 

Two  components  underlying  are  ARAMIS  are  a  database  definition  for 
rheumatic  diseases  and  the  TOD  Software  System.  The  Uniform  Database 
for  Rheumatic  Diseases,  promulgated  by  the  American  Rheumatism  Association 
Computer  Committee,  consists  of  a  vocabulary  of  4??  variables, 
describing  attributes  of  the  various  rheumatic  disease  processes  and 
ranges  from  descriptors  of  demographic  information  to  symptoms,  signs, 
laboratory  values,  diagnoses,  and  therapies.  This  database  definition  is 
the  standard  vocabulary  followed  by  rheumatologists  seeking  to  clinically 
deeer ibe  patients  with  rheumatic  disease  and  has  widespread  acceptance  In 
the  rheumatology  community. 

The  TOO  System  was  developed  within  the  ACME  project  at  Stanford 
University  for  ease  of  data  retrieval  in  useful  clinical  research  formats 
and  Is  now  maintained  by  the  Stanford  University  Computing  Facility 
It  Is  being  further  developed  by  ARAMIS  programmers,  and  currently 
operates  on  an  IBM  370/168  under  a  locally  maintained  timesharing  system, 
OrvyJ,  with  an  adaptation  of  IBM  Pl/1  Programming  objectives  are  beyond 
the  scope  of  this  review,  but  Include  heavy  use  of  macros  common  to  many 
programs,  Internal  self-documentation,  and  optimization  for  retrieval, 
even  at  the  expense  of  entry  or  storage  considerat tons .  Every  effort  has 
been  made  to  keep  system  design  simple  for  the  physician-user  and  to  keep 
search  strategies  Intuitively  reasonable. 

Each  TOD  databank  contains  two  distinct  structures.  The  main  file 
maintains  patient  records,  each  containing  an  array  of  data  from  a 
single  patient  visit.  Visits  are  entered  or  updated  Interactively 
The  transposed  file  contains  corrected  and  validated  data  and  is  used 
for  retrieval.  Each  of  thesa  files  can  be  considered  separately  in 
more  detail . 

TOD  Main  File 

The  main  file  consists  of  all  information,  organized  by  patient  visit. 

In  the  TOD  System  a  patient  course  Is  conceptually  considered  to  be  a 
two-dimensional  array  of  numbers  In  flowchart  form.  Each  column 
represents  a  series  of  observations  (or  elements)  for  a  single  patient 
made  at  the  same  time  variable  In  the  same  patient  over  time. 

By  adding  additional  patient  courses,  a  third  dimension  Is  created, 
whereby  any  value  In  the  data  bank  may  be  accessed  by  three  coordinates 
within  a  conceptual  cube:  the  name  of  the  variable,  the  name  of  the 
patient,  and  the  time-point  of  the  observation. 


at 


IV. C  Example* . TOO  papa  49 

Thus,  a  clinical  data  bank  In  ARAMIS  can  be  defined  as  serial  observations 
for  a  given  set  of  variables  In  a  defined  population.  While  this 
three-dimensional  structure  may  seem  obvious,  hierarchical  system  designs 
in  medicine  have  not  formalized  the  critical  1 1  me  dimension. 

TOO  Transposed  File 

The  transposed  or  retrieval  file  is  a  rearranged  main  file  In  which  all 
values  for  each  attribute  of  the  Uniform  Database  are  placed  Into  as  a 
separate  record.  For  instance,  in  the  TOO/ARAMIS  System  there  are  422 
such  records. 

This  file  makes  data  readily  available  for  Individual  study,  for  detecting 
correlations  and  interactions  between  variables,  for  comparison  of 
therapeutic  interventions  with  changes  in  clinical  course  or  laboratory 
values,  for  life-table  outcome,  analysis,  for  charting  distribution  of 
values  within  a  given  patient  population,  and  for  a  host  of  additional  and 
potential  search  programs. 

TOO  Schema  File 

The  schema  is  the  first  file  in  the  TOO  System.  It  is  a  map  which  defines 

each  database  and  differentiates  it  from  other  TOD  databases.  The  schema  Is 

the  direct  reflection  of  the  user’s  research  concerns,  and  therefore  must  be 

designed  carefully  in  order  for  the  resulting  database  to  meet  the  user’s 

needs . 

Underlying  all  TOO  data  handling  is  the  descriptor  file.  The  descriptor  file 
is  a  machine-readable  file,  derived  from  a  schema  defined  by  the  data  bank 
user,  which  serves  as  a  template  for  the  stratifying,  intervention,  or 
outcome  variables,  for  which  values  are  to  be  collected  in  the  physician- 
patient  encounter. 

All  TOO  programs  reference  the  schema  to  get  information  for  the  meaning  of 
the  different  elements.  Therefore,  the  schema  must  exist  in  a  machine- 
readable  form  The  process  of  developing  a  schema  Involves  creating  a 
human-readable  text  file  in  the  computer  which  syntatically  describes  each 
element  in  the  schema  completely.  Then  a  TOD  program  called  TRANSLAT 
creates  from  the  schema  source  file  a  machine-readable  file,  called  the 
descriptor  file.  Another  program  lists  the  descriptor  file,  producing  a 
document  which  reflects  the  contents  of  the  operating  schema. 

Storage  of  data  in  a  TOP  database  is  cost-effective  because  the  data  files 
contain  only  data  values  The  meaning  of  those  values  is  provided  through 
the  schema  For  example.  If  the  value  of  Parameter  71  for  Patient  1?3  at 
Visit  1  is  4?  6.  the  schema  could  ba  consulted  to  discover  that  Parameter 
21  is  defined  as  Hematocrit. 

Header  and  Parameter  Elements 

Information  is  then  entered  Interactively  from  the  patient  chart  which  Is 
patterned  after  the  descriptor  file  by  means  of  the  entry  program  which 
references  the  descriptor  file  Patient  Information  is  collected  into  one 
of  two  files,  termed  "header*  or  "parameter",  in  accordance  with  whether 
the  data  art  demographic  and,  hence  has  only  the  patient  number  as  its 
ruling  part,  or  subject  to  changa  over  time,  so  that  the  ruling  part 
consists  of  both  the  patient  number  and  the  visit  date. 
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Header  elements  are  those  defined  once  for  each  patient;  such  as  name,  birth 
date,  sex,  etc.  Note  that  although  some  variables,  such  as  the  patient's 
address  may  occasionally  change,  It  Is  normally  not  necessary  to  preserve 
these  changes  through  time;  thus,  the  street  address  Is  another  typical 
header  element.  Often  a  database  owner  will  also  want  to  keep  certain  dates 
as  header  elements,  such  as  the  date  of  the  first  symptom  of  the  disease, 
the  date  of  the  first  diagnosis,  and  so  forth. 

Parameter  elements  are  those  Items  recorded  at  each  visit,  so  that 
multiple  values  of  the  element  exist  for  each  patient.  Parameter 
elements  must  be  numeric  and  should  be  amenable  to  statistical  analysis. 

Data  Types 

There  are  seven  ways  of  representing  clinical  Information  In  TOD,  as 
listed.  These  types  were  developed  to  best  describe  elements  of  the 
disease  process.  The  data  types  In  TOD  are:  VALUE,  CHARACTER,  +RANGE, 

DATE,  DISCRETE,  CODED,  and  OCTAL.  A  CONFIDENTIAL  datatype  remains  unused. 

VALUE  specifies  a  continuous  variable  (such  as  a  laboratory  test)  which 
may  have  any  degree  of  precision:  serum  creatine  =  1.4  . 

♦RANGE  elements  are  assumed  to  be  seml-quantltatlve  disease  descriptors 
as  are  commonly  utilized  to  describe  degrees  of  severity  or  abnormality  on 
a  scale  of  0  to  4«  t  wrist  pain  =  3+  . 

DISCRETE  Is  employed  for  integer  values!  number«-of«-pregnane1es  ■  t  .  This 
definition  limits  certain  computational  procedures. 

CHARACTER  type  provides  for  the  representation  of  textual  data  In  the 
computer:  name  =  Susan  Jones  .  Computation  capability  does  not  exist  for 
such  data,  since  the  encoding  Is  not  controlled.  The  data  type  CODED  provides 
for  controlled  strings.  Only  references  to  CHARACTER  strings  are  kept  in 
visit  files,  so  that  computation  Is  not  delayed  by  large  character  entries. 
The  character  strings  themselves  are  kept  on  a  remote  file.  The  current 
ARAMIS  Implementation  limits  character  type  attributes  to  header  Items. 

In  TOD  as  "1"  and  "0"  for  male  and  female,  respectively. 

CODED  variables  are  ways  of  keeping  attributes  with  controlled  definitions  In 
the  system.  The  terms  are  keot  and  Internally  assigned  to  numeric  values  for 
compactness  and  limited  computation.  For  example,  "sex"  wll  be  Internally 
represented  In  TOD  as  "l"  and  "B"  for  male  and  female,  respectively. 

By  having  a  coded  data  type  for  this  element,  the  entry  clerk  would 

not  need  to  remember  that  "B"  stands  for  female,  but  could  type  female  or  F 

and  have  the  system  understand  that  this  should  be  represented  by  ”0". 

DATE  allows  the  capture  and  computation  of  times:  blrthdate  =  3BJUL67  and 
age  *  TODAY  -  blrthdate  are  permitted  entry  forms,  but  computation  of 
differences  and  Intervals  Is  possible  on  the  Internal  form. 

OCTAL  Is  a  means  whereby  eight  Integers  may  be  stored  under  one  variable. 
Through  this  representation  on  the  Stanford  Orvyl  file  system,  the  use  of 
storage  and,  hence,  cost  may  be  decreased. 

Programs 

The  Chart  Dump  Program  Is  the  only  retrieval  program  which  uses  the  main 
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Programs 

The  Chart  Dump  Program  Is  the  only  retrieval  program  which  uses  the  main 
files,  and  is  used  to  reconstitute  a  patient  record  for  use  within  the 
clinic  setting.  The  Transpose  Program  creates,  from  the  main  header  and 
parameter  files,  the  files  used  for  retrieval.  These  are  organized  by 
individual  variable.  Rapid  and  efficient  retrieval  in  the  traditional 
scientific  formats  occurs  from  these  files  with  reference  to  the  descriptor 
file.  The  Subset  Program  creates  a  library  of  patients  meeting  user 
specified  criteria,  and  this  library  is  accessed  by  the  other  retrieval 
programs  in  studying  defined  subsets  of  the  accumulated  data. 

The  following  table  lists  some  of  the  currently  available  retrieval  programs 
In  TOD.  When  asking  a  research  question,  the  individual  TOD  Investigator 
will  typically  choose  several  programs. 

RETRIEVAL  PROGRAMS 

PROFILE  -  Hiitographic  Distribution 
CRITTER  -  Diagnostic  Criteria  Counts 
SCATTER  -  X-Y  Graph  of  Variables 
MULTREVU  -  Mean  and  SE  of  Variables  in  Subsets 
OUTCOME  -  Life  Table  Analysis  of  Variables 
AUTOSET  -  Computer  Consultation 

LIST  -  List  of  Variables  in  a  Subset 
TIMESCAT  -  X-Y  Graph  of  Variable  Over  Time 
SUBSET  -  Group  Meeting  Specified  Criteria 

MW  -  Ranking  Variables  by  Logistic  Regression 


Thus,  one  might  execute  a  Profile  analysis  on  a  given  population  to 
ascertain  the  presence  of  sub-populations  for  further  subsetting  by 
the  Subset  Program.  If  this  pertains,  a  Multrevu,  which  examines 
the  elements  of  the  Uniform  Database  between  sub-populations  and 
looks  for  differences  in  mean  values  between  groups,  might  be  selected. 
Or  an  Outcome,  utilizing  life-table  analysis,  might  detect  differences 
in  prognosis  between  groups  over  time.  The  dynamic  nature  of  the 
system  allows  for  interactive  reformulation  of  the  evolving  research 
question.  A  typical  retrieval  program  requires  less  than  a  second  of 
interactive  computer  time  and  less  than  three  minutes  of  Investigator 
time,  including  problem  specification  and  output  printing. 
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IV. 0  HMO  Support:  COSTAR  5 

[  This  material  Is  based  on  the  Functional  Description  of  the  COSTAR  5 
System.  The  Laboratory  of  Computer  Science,  Hassachusetts  General  Hospital, 
Cambridge,  Mass,  gave  permission  to  use  this  copy-righted  material  here.] 

INTRODUCTION  AND  OBJECTIVES 

COSTAR  (COmputer-STored  Ambulatory  Record)  Is  a  computer-based 
ambulatory  Information  system  which  Improves  and  expands  upon  the 
capabilities  of  a  traditional  medical  record.  Although  use  of  the 
term  "record"  has  historical  precedence,  COSTAR  is  more  appropriately 
considered  an  information  and  communication  SYSTEM  designed  to  meet 
both  the  medical  care  and  financial/administrative  needs  of  either 
a  fee-for-servlce  or  a  prepaid  group  practice  health  maintenance 
organization  (HMO). 

The  central  objectives  of  COSTAR  are  to: 

1)  Facilitate  patient  care  by  Improving  the  availability  of 
medical  Information  In  terms  of  accessibility,  timeliness  of 
retrieval,  legibility,  and  organization. 

2)  Enhance  the  financial  viability  of  the  medical  practice 

by  providing  a  comprehensive  billing  system  with  accompanying 
accounting  reports. 

3)  Facilitate  medical  practice  administration  by  providing 
the  data  retrieval  and  analysis  capability  required  by 
management  for  day  to  day  operation,  budgeting,  and  planning. 

4)  Provide  data  processing  support  for  administrative  and 
ancillary  services  e.g.,  scheduling,  laboratories  and  planning. 

5)  Provide  the  capability  to  generate  standardized 
management  reports  and  support  manager-specified  inquiry  and 
report-generation  or  any  elements  of  the  database. 

6)  Support  programs  of  quality  assurance  by  monitoring  the 
content  of  the  database  according  to  physician-specified  rules 
and  to  report  automatically  any  deviations  from  these  standards 
of  care. 

COSTAR  Is  designed  to  have  minimal  impact  on  the  physician’s  habit  pattern 
of  recording  Information.  It  is  also  designed  to  be  the  SINGLE  integrated 
information  system  for  the  practice  and  thus  to  eliminate  duplicate  data 
recording  and  duplicate  Information  processing  for  medical  care,  billing, 
and  administration.  All  data  are  collected  by  the  medical  staff  using 
specifically  designed  forms  and  are  entered  Into  the  system  by  clerical 
staff  through  simple  interactions  with  the  computer  using  video  terminals 
directly  connected  to  the  computer.  The  collected  data  are  stored  on 
magnetic  disks  so  that  the  Information  Is  always  available  and 
simultaneously  accessible  at  multiple  locations. 
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SYSTEM  MOOULES 

The  basic  design  of  COSTAR  presents  a  modular  system  with  a  large  variety 
of  available  options,  allowing  the  system  to  be  customized  to  the  specific 
needs  of  each  group  practice.  The  modules  which  are  basic  to  the  system  are 

SECURITY  AND  INTEGRITY  MODULE.  These  routines,  which  are  an  Integral  part 
of  all  modules,  provide  for  Identifying  and  logging  In/out  all  terminals 
and  users  for  the  purpose  of  preventing  unauthorized  access  to  medical  and 
administrative  Information.  The  module  will  also  provide  the  support 
routines  to  monitor  the  functioning  of  the  system,  provide  transaction 
logging,  and  prevent  data  loss  In  case  of  machine  failure. 

REGISTRATION  MODULE.  These  Interactive  routines  are  used  for  the  entry 
and  review  of  all  Identification  date,  demographic.  Insurance,  and 
administrative,  for  each  patient  and  family.  It  Is  possible  for  the 
practice  to  select  the  Items  to  be  collected  In  the  registration  sequence 
from  a  large  menu  of  pre-coded  fields.  If  necessary,  the  practice  may 
also  define  additional  registration  Items. 

SCHEDULE  KEEPING  MODULE.  This  set  of  routines  allows  on-line  booking  and 
cancellation  of  appointments,  review  of  current  appointments,  and  production 
of  legible,  accurate  schedules  and  day  sheets.  For  the  scheduling  of  non¬ 
members  or  new  patients  a  minimal  registration  sequence  Is  available. 

MEDICAL  RECORD  MODULE.  This  series  of  routines  provides  for  data  Input 
from  encounter  forms,  and  accessibility  to  the  total  medical  and 
administrative  database.  Olrect  Inquiry  Into  this  database  can  be  done 
through  computer  terminals.  The  computer-generated  medical  record  is  made 
available  for  each  scheduled  visit.  This  module  represents  the  core  of  the 
Information  system  and  provides  a  large  variety  of  options  for  recording, 
manipulating,  organizing  and  displaying  the  data. 

BILLING  AND  ACCOUNTS  RECEIVABLE  MODULE.  This  subsystem  uses  the  patient 
identification  data  captured  on  the  encounter  form  to  prepare  monthly 
statements  for  each  account  and,  at  practice  option,  to  produce  superbills 
and  third  party  claim  forms.  Complete  accounts  receivable  audit  trails 
are  maintained  and  a  wide  selection  of  accounting  reports  are  available. 

MANAGEMENT  REPORTING  MODULE.  These  routines  provide  pre-programmed,  standard 
reports  (  Denson ‘ tables  [Densen72],  utilization  and  membership  reports,  and 
revenue  analysis  reports.)  This  module  also  allows  the  practice  to  specify 
the  parameters  for  search  routines  which  operate  on  the  database  to  produce 
patient  listings,  and  standardized  tabulations  and  cross-tabulations. 

CARDINAL  COSTAR  FEATURES 

We  will  present  the  Medical  Records  Module  In  more  detail  below.  In  the 
classical,  hand-written  medical  record,  a  provider  has  almost  unlimited 
freedom  of  expression,  since  the  record  begins  with  a  blank  sheet  of  paper. 
In  contrast,  there  are  certain  procedural  rules  In  COSTAR  which  Inherently 
constrain  medical  recording  practices.  On  the  other  hand,  because  of  these 
restrictions,  and  because  COSTAR  Is  a  data-based  system  rather  than  a 
document-based  system,  there  ere  significant  advsntages  In  terms  of 
accessibility  of  recorded  Information.  The  procedures  and  advantages  which 
most  clearly  differentiate  COSTAR  from  a  manual  medical  record  system  are: 
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The  practice  must  enter  at  least  a  minimal  set  of  registration  data  on  each 
patient.  This  provides  a  single  data  file  which  Is  always  available  for 
patient  Identification,  Insurance  and  billing  Information,  family  linkage, 
and  demographic  Information.  This  file  can  be  accessed  by  author)*  ;d  users 
at  remote  terminals  by  either  patient  number  or  alphabetic  look-up. 

Data  are  collected  at  each  patient  visit  by  recording  both  administrative 
and  medical  Information  on  a  form  which  Is  specifically  designed  for  the 
needs  of  the  particular  medical  group  and/or  speciality.  This  ENCOUNTER 
FORM  Is  the  single  source  document  which  Is  designed  to  capture  all  data 
which  providers  find  necessary,  and  routinely  collect,  In  clinical  practice. 
This  recording  technique  facilitates  practice  efficiency,  cost-effectiveness, 
and  data  Integrity  In  that  the  data  from  this  single  document  supplies  the 
multiple  needs  of  medical  records,  accounts  receivable,  management 
reporting,  quality  assurance,  medical  audit,  and  research.  The  ENCOUNTER 
FORM  provides  for  the  recording  of  Information  In  a  structured  format  so 
that  each  particular  type  of  datum  (e.g.,  telephone  number,  medication)  Is 
uniquely  Identified.  The  encounter  form  Is  a  self-encoding  check-list; 
the  Important  data  elements  at  each  encounter  (e.g.,  names  of  diagnoses, 
medications,  procedures,  and  laboratory  tests)  are  recorded  by  the  provider, 
who  checks  the  appropriate  Item  on  the  form.  Next  to  the  box  for  the 
checkmark  a  5-character  code  has  been  printed.  Within  the  computer-stored 
database  all  Information  is  organized  and  accessed  by  the  designated 
code.  Detailed  Information  concerning  the  particulars  of  the  diagnoses, 
therapies,  test  results,  etc.,  are  recorded  In  narrative  text  (using 
either  hand-written  notes  on  the  encounter  form  or  associated  dictation). 
However,  this  narrative  information  is  linked  to  the  encoded  information 
and  Is  always  accessed  and  displayed  with  this  code. 

Medical  record  data  are  provided  oy  •  computer-g«n«ratoa  printed  output  Tor 
routine  (l.e.,  scheduled)  patient-care.  In  COSTAR,  the  computer  always 
prints  the  most  up-to-date  Information.  Several  printed  copies  of  the 
patient's  record  may  be  simultaneously  available  in  different  locations.  In 
contrast  to  a  hand-written  record,  COSTAR  is  not  restricted  to  displaying 
medical  Information  In  the  temporal  sequence  or  form  in  which  It  was  entered; 
instead,  the  computer  Is  programmed  to  select  the  pertinent  subset  pf  the 
data  and  present  this  Information  in  different  formats  according  to  the  needs 
of  different  specialties.  The  organization  of  the  computer-generated  output 
emphasizes  medical  data.  The  objective  of  the  organization  is  to  present  the 
Information  In  a  form  that  facilitates  scanning  of  the  relevant  data  in  a 
minimal  period  of  time.  Since  this  Information  is  always  stored  in  the 
computer’s  files,  the  computer  output  can  be  discarded  after  use. 

COSTAR  enables  immediate  access  to  all  patient  and  administrative  information 
through  use  of  one  of  the  computer  terminals.  Direct  inquiry  into  the 
database  is  simple  and  rapid;  the  user  responds  to  a  series  of  questions 
posed  by  the  computer,  entering  on  the  keyboard  the  patient’s  name  or 
identification  number  and  the  type  of  Information  desired.  Although  all  the 
data  are  directly  accessible,  the  user  may  choose  to  examine  only  certain 
Information  such  as  telephone  number,  most  recent  visit  note,  latest 
laboratory  test  values,  etc.,  or  may  direct  the  computer  to  present  the 
information  as  a  flowchart  of  particular  types  of  data  (e.g.,  all  blood 
pressures  displayed  together  with  all  cardiovascular  medications  and  serum 
potassiums).  The  user  Interaction  at  the  computer  terminal  may  be  an 
Iterative  series  of  requests  resulting  In  a  series  of  different  displays  of 
patient  data.  This  process  is  a  greatly  extended  analog  of  leafing  through 
a  written  medical  record. 
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COSTAR  Is  an  ACTIVE  or  RESPONSIVE  system  in  that  the  processing  and  display 
of  Information  Is  a  function  of  the  content  of  the  data.  Because  much  of 
the  record  Is  coded,  It  Is  possible  for  the  computer  to  tailor  the  output 
according  to  the  characteristics  of  the  individual  patient  and  of  the  care 
which  has  been  given.  This  contrasts  with  a  manual  system  which  Is  a 
completely  passive  archival  system  and  therefore  Insensitive  to  the  content, 
meaning,  or  significance  of  the  Information.  The  ability  of  COSTAR  to 
"understand"  the  encoded  data  makes  it  possible  for  the  physicians  to  develop 
automated  programs  for  quality  assurance.  Computer  programs  can  be  written 
to  monitor  the  recorded  care  of  every  patient  according  to  standards  of  care 
defined  by  the  particular  group  practice.  Whenever  a  deviation  from  the 
standard  occurs,  COSTAR  can  automatically  notify  the  appropriate  physician  or 
nurse,  allowing  corrective  action  to  be  taken  for  that  particular  patient 
care  situation.  Active  surveillance  and  automatic  feedback  are  two  features 
of  COSTAR  which  cannot  be  easily  duplicated  In  a  manual  system  and  which 
represent  unique  additional  capabilities  for  facilitating  patient  care. 

COSTAR  provides  a  capability  for  easy  analysis  of  the  database,  either 
through  standardized  management  reporting  programs  or  via  programs  which 
allow  user  specification  of  search  strategies  and  reports.  (A  major 
weakness  of  manual  medical  record  systems  is  that  it  is  costly  in  personnel 
time  to  perform  aggregate  data  analysis  of  groups  of  patients;  a  similar 
weakness  of  most  automated  accounting  systems  Is  that  the  data  are 
unavailable  except  through  standard  pre-specified  reports.  COSTAR  provides 
an  Interactive  language  to  allow  the  non-programmer  to  generate  a  variety  of 
analysis  routines  or  reports  by  a  simple  specification  of  the  search 
parameters  needed  to  select  the  desired  groups  of  patients,  and  a  report 
generator  program  to  permit  the  user  specification  of  the  listings, 
tabulations  or  cross-tabulations  desired. 

COSTAR  has  been  designed  to  be  adaptable  to  a  variety  of  practice  settings. 
The  system  can  be  tailored  to  the  needs  of  a  specific  practice  because  of  Its 
modular  construction,  and  because  of  Its  extensive  use  of  directories  as  the 
method  of  defining  the  structure  and  content  of  the  record.  Suggested 
content  for  these  directories  {diagnostic  codes  and  modifiers,  medication 
terms,  laboratory  test  normal  values)  are  supplied  with  COSTAR,  but  the 
content  may  be  easily  modified  or  extended  by  each  practice.  This  permits 
each  practice  to  take  advantage  of  the  COSTAR  system,  and  yet  Individualize 
actual  operation  to  meet  local  needs. 

PROVIDER  EDUCATION 

One  of  the  dominant  limiting  factors  In  the  application  of  computer 
technology  to  medical  practice  Is  the  necessity  of  achieving  physician 
acceptance.  COSTAR  provides  a  buffer  between  the  physician  and  the  computer 
technology  by  having  the  physician  record  on  paper  forms  which  are  then 
Input  Into  the  computer  system  by  clerical  personnel.  The  main  difference  In 
recording  practice  Introduced  by  COSTAR  Is  that  information  must  be  recorded 
in  specific  patterns  associated  with  a  single  medical  entity,  e.g.,  all  the 
clinical  findings  associated  with  a  specific  disease  must  be  associated  with 
the  code  for  that  disease.  This  philosophy  of  record-keeping  is  a  modified 
form  of  "problem-orientation"  and  seems  to  be  acceptable  to  a  wide  variety 
of  practices.  COSTAR  is  designed  so  that  the  medical  staff  can  learn  to  use 
the  system  after  only  a  few  minutes  to  an  hour  of  explanation.  However  for 
a  practice  to  take  full  advantage  of  all  the  features  of  COSTAR,  it  is 
necessary  that  there  be  a  more  extensive  period  of  provider  education,  since 
some  of  the  concepts  such  as  the  "status"  of  a  diagnosis,  or  the  recording 
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of  modifiers  for  a  diagnosis  are  unique  In  comparison  to  standard  medical 
record  practices.  In  general,  the  more  the  medical  staff  understands  the 
functional  capability  of  COSTAR,  the  more  the  system  can  be  used  to  Improve 
record  keeping  and  patient  care. 

Interaction  at  the  display  terminal 

Interaction  at  the  terminal  Is  designed  to  be  simple  and  consistent.  For 
Instance  the  user  Is  frequently  offered  the  option  to  edit  a  previous 
response.  In  all  such  cases,  the  old  Information  Is  presented,  enclosed  In 
carets,  e.g.,  <OLD  INFORMATION),  and  the  new  Information  may  be  entered 
Immediately  following  the  ">"  symbol.  If  the  current  telephone  number  Is 
Incorrect,  the  editing  would  be  accomplished  as  follows: 

TELEPHONE  <821-3114)821-3141 

In  this  case,  the  user  has  typed  the  correct  number  followed  by  pressing  the 
ENTER  key.  Retaining  the  old  Information  Is  accomplished  simply  by  pressing 
the  ENTER  key  without  entering  any  new  Information. 

In  some  cases,  editing  consists  of  removing  existing  Information  from  the 
patient  file.  For  example,  If  a  patient  Is  no  longer  employed,  the  office 
telephone  number  should  be  deleted  from  the  file.  This  Is  done  by  entering 
a  minus  sign 


OFFICE  TELEPHONE  <965-0811)- 

By  using  the  minus  sign,  the  data  for  the  field  OFFICE  TELEPHONE  has  been 
deleted  from  this  patient's  record. 

The  Medical  Record 

The  COSTAR  Medical  Records  module  is  designed  to  provide  the  medical  practice 
with  timely  and  legible  medical  records.  COSTAR  improves  the  accessibi 1 ity 
of  medical  data  by  optimizing:  a)  the  availability  of  the  Information; 
b)  the  appropriateness  of  organization;  and  c)  the  style  of  presentation. 

The  encounter  form  Is  the  primary  document  used  to  record  medical  data. 

The  Information  on  the  form  Is  structured  by  data  type;  and  each  element  is 
associated  with  a  code,  the  date  of  collection,  and  the  name  of  the 
provider  Involved.  Because  of  this  structure  and  this  coding,  it  Is 
possible  for  COSTAR  to  generate  output  which  highlights  the  Important 
components  of  the  medical  Information  and  which  Is  tailored  to  the  needs 
of  the  particular  specialty  for  which  the  record  is  being  generated. 

The  computer  generated  output  Is  used  for  the  routine  care  of  scheduled 
patients,  for  walk-in  patients,  for  telephone  calls,  for  follow-up  of 
selected  patients,  for  patients  selected  as  being  of  particular  concern  by 
the  automated  quality  assurance  studies,  for  consultations  between 
different  providers,  and  for  transmission  to  other  physicians,  hospitals, 
or  Insurance  companies. 

Because  of  the  COSTAR  structure  and  coding,  It  Is  also  possible  to  use  the 
database  forquallty  assurance,  for  medical  audit,  for  descriptive  studies  of 
the  patient  population  (In  terms  of  patterns  of  disease,  treatments  given, 
and  outcomes)  and  for  medical  research.  Although  the  codes  are  unique  to 
COSTAR,  the  system  contains  translation  tables  which  can  convert  COSTAR 
codes  to  the  coding  system  required  by  the  particular  third  party  carrier. 
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Output  Documents 

There  are  three  different  types  of  output  provided  by  COSTAR:  ENCOUNTER 
REPORT,  FLOWCHART,  and  STATUS  REPORT.  For  scheduled  visits  a  combination 
of  these  three  different  types  Is  routinely  generated,  depending  on  the 
particular  needs  of  the  specialty  and  the  practice. 

Encounter  Report.  This  computer-generated  report  is  equivalent  to  a  medical 
"note"  reflecting  the  activity  at  a  single  visit  or  encounter  with  a  patient.* 
This  ENCOUNTER  REPORT  displays  In  a  standard  format  both  those  data  collected 
and  entered  via  single  encounter  form,  and  the  data  reflecting  laboratory 
test  results  associated  with  that  encounter,  which  may  be  entered  separately. 
The  data  from  each  such  encounter  form  may  be  retrieved  as  an  encounter 
report,  which  is  Identified  by  patient,  date  and  provider.  The  encounter 
report  displays  all  codes  entered  on  that  visit,  with  the  status  flag(s), 
modlfler(s),  associated  text  and/or  results(s).  All  data  are  presented  in 
the  following  sequence: 


A. 

Patient  Identification 

F. 

Physical  Examination  Data 

B. 

Encounter  Identification 

G. 

Medications  and  Therapies 

C. 

Provider  Name(s) 

H. 

Procedures 

D. 

Disposltlon(s) 

I. 

Laboratory  Tests 

E. 

Diagnoses  and  Problems 

0. 

Administrative  Data 

Flowcharts.  This  form  of  computer-generated  report  emphasizes  the  temporal 
course  of  the  disease  process  or  the  variation  In  clinical  findings  over 
time.  The  display  Is  a  chronological  listing,  by  date,  of  all  occurrences  of 
particular  coded  Items  with  associated  text  and  or  results.  The  medical 
practice  may  create  any  number  of  flowchart  format  "templates*  that  specify 
which  COSTAR  codes  are  to  be  displayed,  and  the  output  format  of  the  report. 
The  template  Is  organized  by  columns.  Each  column  may  Include  one  or  more 
codes.  For  example,  the  report  generated  from  a  sample  template  Intended 
for  follow-up  of  hypertensive  patients  contains  colums  labeled  WGT,  BLOOD 
PRESSURE,  CREA,  URIC  ACID,  and  K+;  and  includes  all  statuses,  results,  and 
textual  Information  associated  with  the  COSTAR  codes  for  weight,  blood 
pressure,  serum  creatinine,  serum  uric  acid  and  serum  potassium  respectively. 
The  column  marked  MEDICATIONS  Includes  many  anti-hypertensive  drugs.  When 
mulltlple  codes  are  specified  for  one  column  the  name  of  each  code  Is  given 
In  the  flowchart. 

A  template  may  have  any  number  of  associated  "trigger*  codes.  The  presence 
of  any  one  or  more  of  these  codes  In  a  patient  record  will  cause  the 
corresponding  flowchart  to  be  generated  whenever  a  STATUS  REPORT  Is  printed. 
For  example,  the  diagnosis  of  hypertension  could  be  given  as  a  trigger  for 
the  hypertension  flowchart  template. 

Status  Report.  The  STATUS  REPORT  serves  both  as  an  index  to  the  content 
of  the  computer-based  medical  record  and  also  as  a  summary  of  the  most 
recently  collected  data.  The  STATUS  REPORT  consists  of  seven  components: 

The  HEADER  Information  which  contains  the  patient  Identifying  information, 
demographic  and  personal  data. 

The  DISPOSITION  Information  which  represents  the  plan  (e.g.,  future 
appointments)  with  this  patient. 


The  DIAGNOSIS  or  Problem  Information  which  represents  the  medical  assessment 
entered  by  the  physicians  or  nurses  who  have  given  care  to  this  patient. 
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The  PHYSICAL  EXAM  data  (e.g.,  vital  signs)  which  are  available  on  this 
patient. 

The  THERAPY  section  which  lists  the  Medications  and  Therapies  prescribed 
for  this  patient. 

The  PROCEDURE  section  which  lists  operations,  Immunizations,  Invasive 
tests,  etc. 

The  TEST  RESULTS  section  which  lists  the  most  recent  test  results  on 
the  patient.  Abnormal  test  results  are  flagged  with  an  asterisk. 

If  the  patient  has  ever  had  an  abnormal  result  for  a  particular  test, 
a  flowchart  of  the  five  most  recent  results  for  that  test  Is  displayed. 

The  STATUS  REPORT  gives  the  date  the  particular  medical  Item  was  first 
mentioned  (e.g.,  the  date  a  specific  diagnostic  term  was  first  used), 
the  number  of  encounters  at  which  the  Item  had  been  checked,  and  the 
last  date  at  which  the  particular  Item  was  mentioned.  Detailed  Information, 
as  free  text.  Is  given  only  for  the  most  recent  Instance  In  which  such  text 
was  associated  with  that  particular  code. 


System  Summary 

COSTAR  Is  programmed  In  Standard  MUMPS  and  can  be  supported  by  any 
computer  system  configuration  that  supports  Standard  MUMPS.  COSTAR  Is 
designed  to  take  advantagee  of  the  recent  advances  in  computer  hardware 
technology  which  have  resulted  in  a  dramatic  reduction  In  the  cost  of  the 
computer  processor  and  disk  storage.  It  Is  anticipated  that  In  most 
practices,  COSTAR  will  be  an  In-house  system  with  a  variety  of  peripherals 
and  storage  capability  based  upon  practice  needs  and  requirements. 
Depending  upon  the  size  of  the  system  and  the  configuration  chosen  the 
system  cost  will  probably  bo  between  $75,000  and  $200,000  -  with  the 
smaller  system  being  appropriate  for  a  small  group  practice  (e.g.,  five 
physicians)  and  the  larger  systems  being  used  for  practices  of  15  or  more 
physicians.  This  price  should  make  COSTAR  a  cost-effective  alternative 
for  many  offices  currently  using  manual  or  partial  service-bureau  systems. 
The  computer  configuration  on  which  COSTAR  Is  now  being  Implemented  Is  a 
Digital  Equipment  Corporation  PDP-11. 
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IV. E  Hospital  Systems:  PATIENT  ORDER  MANAGEMENT  AND  COMMUNICATION  SYSTEM 

(  Used  at  the  Coral  Gables  Variety  Childrens  Hospital  ) 

[This  material  Is  based  on  descriptive  material  provided  by  Dynamic  Control 
and  IBM.] 

The  Patient  Order  Management  ^nd  Communication  System  (POMCS)  was 
developed  as  a  hospital-wide  computer  Information  system  that  provides  a 
communication  link  between  the  admissions  office,  nursing  stations, 
anclllarles,  and  the  accounting  department.  It  was  developed  by  Dynamic 
Control  Corporation  of  Coral  Gables,  Florida  and  Is  Installed  at  Variety 
Childrens  Hospital  to  facilitate,  expedite,  and  Integrate  the  delivery  of 
health  care  services  and  the  operation  of  the  hospital.  This  hospital  Is  a 
1-88  bed  facility  with  an  average  stay  of  9  days.  Doctor’s  orders  are 
entered  through  video  displays  at  the  nursing  stations.  The  orders  are 
automatically  transmitted  to  the  appropriate  ancillary  departments  and 
added  to  the  patients  record.  These  departments  have  the  ability  to 
display  work  to  be  done  and  to  enter  results.  Results  may  then  be 
transmitted  back  to  the  nursing  station  In  hardcopy  form  to  become  a  part 
of  the  chart.  The  result  also  becomes  a  part  of  the  patient's  record  and 
Is  available  for  Inquiry  through  a  video  display  to  authorized  personnel. 
Cumulative  summaries  may  also  be  produced.  Charges  for  patient  services 
are  automatically  collected  and  made  available  to  the  accounting 
system. 

The  system  now  provides  support  for  the  ancillary  departments  at  Variety 
Childrens  Hospital.  The  system  Is  designed  to  be  utilized  In  a  modular 
fashion,  so  If  Installed  at  another  hospital  It  can  support  those 
-departments  deemed  necessary  there. 

OBJECTIVES 

POMCS  has  as  objectives  to  provide  revenue  Increases,  to  lead  to 
Increased  personnel  productivity,  cost  savings,  and  Improved  patient 
care  quality,  through  the  following: 

Revenue  Increases 

1.  Automatic  generation  of  patient  charges  and  control 

2.  Reduced  forms  cost 

3.  Accountability  of  floor  stock 

4.  Significantly  reduced  lost  and  late  charges 

5.  Automatic  pricing  and  control  of  prices 

6.  On-line  census  Information  for  Improved  bed  utilization 

7.  Charges  can  be  generated  Immediately  for  outpatients  and 
receipts  can  be  entered  on-line. 

Personnel  Productivity  and  Cost  Savings 

1.  Reduction  of  clerical  activity  for  nursing  by  reducing 

the  need  to  transcribe  orders  to  multiple  working  documents 

2.  Entry  costs  due  to  the  use  of  display  menus  for  data  collection 

3.  Standard  ordering  procedures 

4.  Automatic  communication  of  test  results  to  nursing  units 

5.  Improved  document  legibility 

6.  Errors  resulting  from  order  transclptlon  can  be  reduced 


I 
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Pat 1 ant  Cara  Quality 

1.  Hora  nursing  tints  for  patient  cars,  through  reduction  of 
clerical  work. 

2.  Improved  control  of  order  status  from  order  time  to  receipt 
of  result. 

3.  More  expedient  order  processing  by  eliminating  the  transfer 
of  forms  between  requester  and  provider. 

4.  Cumulative  result  summaries  for  the  medical  record. 

5.  Duplicate  orders  can  be  eliminated. 

6.  Inquiry  as  required  to  determine  order  status  and/or  results. 

7.  Reduce  order  rejects  because  of  legibility. 

SYSTEM  DESCRIPTION 

POMCS  uses  an  IBM  System  32  to  support  a  network  of  Interactive  display 
terminals  and  printers  which  link  nursing  stations  and  the  major  service 
areas  of  the  hospital.  The  Patient  Order  Management  and  Communication 
System  Is  written  In  RPG  II. 

FUNCTIONS 

POMCS  Includes  four  major  on-line  Interactive  functions: 

1.  Admission/Discharge/Transfer 

2.  Order  Communications 

3.  Patient  Billing  &  Accounts  Receivable 

4.  Outpatient/Emergency  Room  Registration 

The  program  modules  which  Implement  these  functions  are  described  below. 

ON-LINE  PRE-ADMISSION  ANO  ADMISSION,  DISCHARGE,  TRANSFER,  AND  CENSUS: 

Patients  are  preadmitted,  admitted,  transferred,  or  discharged  using  display 
terminals  via  Interactive  screen  data  entry  and  editing.  Patient  data  Is 
entered  only  once  -  at  pre-admission  time.  Demand  census  from  a  display 
station  provides  current  Information  on  patient  location. 

ON-LINE  OUTPATIENT  REGISTRATION:  The  system  provides  outpatient  and 
emergency  room  registration  for  prior  patients,  as  well  as  new  patients. 
On-demand  outpatient  billing  information  Is  also  available  to  facilitate 
collection  from  outpatients  as  they  leave  the  facility. 

ON-LINE  ORDER  ENTRY  AND  ORDER  CONTROL:  Orders  may  be  entered  through  display 
terminals  utilizing  successive  menus  or  from  a  single  screen  If  service 
codes  are  known.  The  status  of  orders  may  be  tracked  from  the  time  of  entry 
until  acknowledgement  or  result  entry  by  the  ancillary  department  occurs. 

1.  Order  Entry:  The  use  of  standard  assumed  options  and 
override  capabilities  permits  fast,  accurate,  and 
Individually  tailored  order  Information. 

2.  Order  Status:  Orders  may  be  displayed  for  a  patient  or 
an  ancillary  department. 

3.  Order  Acknowledgements  can  be  placed  In  the  system  by  the 
respective  service  departments  upon  receipt  or  completion 
of  the  ordered  service.  The  order  Is  then  flagged  as 
complete,  charges  are  posted  to  the  patient  account  and 
the  transactions  are  passed  to  the  billing  system. 
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4.  Order  Result  Entry:  Certain  orders  may  be  coded  as  requiring 
results  such  as  laboratory  orders.  In  place  of  acknowledging 
the  completion  of  the  service,  results  may  be  entered  using 

a  number  of  user  defined  formats.  Once  entered,  the  results 
are  printed  at  the  patient's  nursing  unit. 

5.  Repeat  orders  need  to  be  entered  only  once  because  the 
system  will  regenerate  them  for  the  time  period  required. 

6.  Order  Maintenance:  Orders  requiring  only  an  acknowledgement 
may  be  altered  to  change  the  service  provided  before  the 
charge  is  entered. 

ON-LINE  RESULT  REPORTING:  The  posting  of  results  to  various  orders  is 
provided.  Once  the  results  have  been  entered  by  the  service  department, 
they  are  printed  Immediately  at  the  patient’s  nursing  station.  The  results 
of  that  day  are  combined  to  produce  dally  and  final  cumulative  reports. 

The  form  of  these  reports  provide  a  tabular  graph  of  similar  results  over 
time.  Results  may  be  displayed  and  corrected  for  a  specific  patient  by  the 
ancillary  department,  however  the  original  result  reported  claims  as  part 
of  the  Patient  Master  Record  for  audit  purposes. 

ORDER  ENTRY  CHARGE  COLLECTION:  The  Catalog  file  contains  prices  for  patient 
services  and  Indicates  when  to  post  the  charge,  i.e.,  at  order  time, 
acknowledgement  time,  or  at  result  reporting  time.  For  an  entered  order, 
the  charge  is  automatically  posted  at  the  specified  time. 

ON-LINE  FILE  MAINTENANCE:  The  system  contains  facilities  for  on-line 
maintenance  of  the  Doctor,  Insurance,  Catalog,  Screen,  and  A/R  files.  Each 
ancillary  department  may  update  their  own  section  of  the  Catelog  file. 

Price  changes  are  limited  by  a  security  code.  Departments  may  also  alter 
the  format  of  their  display  screens  as  necessary.  Each  ancillary 
department  should  be  responsible  for  maintenance  of  their  files. 

MESSAGE  PROCESSING:  Ourlng  order  processing,  free  form  text  messages 
such  as  "patient  needs  wheelchair"  may  be  included.  A  message  file 
Is  provided  from  which  automatic  messages  can  be  generated  with  certain 
orders  by  having  appropriate  codes  In  the  Catalog  File. 

PATIENT  HISTORY  FILE:  A  file  Is  available  In  the  system  to  maintain 
patient  records.  Should  a  patient  enter  the  facility  as  an  In-  or 
outpatient,  the  data  on  the  patient  needs  merely  to  be  reviewed  or 
updated  before  the  system  automatically  prepares  the  new  registration 
or  admission  forms. 

PATIENT  BILLING:  The  system  Includes  the  following  billing  functions: 

Demand  Bill  Inquiry:  The  business  office  may  display  a 
patient’s  bill  at  anytime. 

Outpatient  Billing:  Bills  are  printed  dally  to  show  the 
Itemized  charges  for  that  day.  Follow-up  mailers  can  be 
printed  as  required. 

Inpatient  Billing:  Final  bills  will  usually  be  requested  by 
the  business  office.  Prorated  patient  bills  for  Insurance 
coverage  are  printed  after  the  final  bill  Is  requested. 
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ACCOUNTS  RECEIVABLE:  The  Accounts  Receivable  functions  In  the  system 
Include  Bad  Debts  Reporting,  Patient  Trial  Balances,  Insurance  Company 
Reports,  Cash  Receipt  Reports,  and  Pattent  Accounts  Receivable 
Activity  Reports. 

PATIENT  CENSUS:  Patient  by  room  census  Is  available  as  an  on-line  function. 
Dally  batch  census  reports  can  Include  ADT  logs,  alpha  &  numeric  patient 
room  reports. 

SYSTEM  SECURITY  AND  CONTROL:  The  basis  of  control  In  the  system  Is  a 
non-dlsplayed  operator  security  code.  After  signing  on  the  system,  the 
operator  may  perform  only  those  functions  permitted  with  his  security 
code.  Each  program  has  Its  own  allowable  security  code  requirements. 
Cancellation  of  orders  Is  controlled  by  the  security  code.  All  order 
entry  transactions  entered  Into  the  system  are  logged  by  date,  time  of 
entry,  and  Identified  with  the  security  code  of  the  person  performing 
the  transaction.  All  other  transactions  are  logged  by  date  and 
security  code. 

REPORTS:  Some  of  the  reports  which  can  be  generated  by  the  system  Include: 

Patient  lists 
Doctor  lists 
Lab  Flow  Sheets 
Order  Result  Summaries 

Orders  outstanding  by  department,  nursing  station  and  patient 
Patient  Bills  and  Billing  reports 

Outpatient  reports,  including  patients  registered  during  the 
day  with  no  charges  entered 
Patient  Census  reports 
Bad  debt  reports 
Cash  receipts  reports 
Patient  Trial  Balances 
Insurance  Company  Accounts  Receivable 
Transaction  journals 
Department  Journals 
Admission  forms 


SUMMARY 

This  system  provides  an  example  of  the  new  generation  of  hospital 
information  systems.  It  runs  on  a  relatively  Inexpensive  computer, 
so  that  It  Is  feasible  for  the  hospital  to  own  the  equipment,  and  hence 
control  Its  expenditures  to  a  large  extent.  The  hospital  has  no 
programming  personnel  and  relies  wholly  on  a  software  vendor.  Once 
a  basic  and  reliable  operation  Is  established  system  improvements 
have  to  be  negotiated  with  the  vendor.  While  this  causes  some  delay,  it 
also  makes  the  hospital  administration  aware  of  the  costs  associated 
with  new  and  changed  software  specification,  a  problem  commonly 
underestimated  when  software  Is  written  and  maintained  In-house. 
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